One advantage of fixed length rows is that a file of them
can be mapped as a matrix, rather than having to deal with
CSV as varying length bits of a stream.
- joey
At 11:32 +0800 2008/01/04, Alex Rufon wrote:
Hi Nick,
If your only choice is between fixed length and CSV ... I'd go with CSV.
I actually do not know if its possible but maybe
you can store your data using SQLite or any
database. That way you can partition your data
to only what you need (although since your
columns are the coordinates so it may be a pain)
by constructing clever sql statements. I believe
that the problem is not in storing the data but
in retrieving the right amount of data to fit
into your memory/work space.
r/Alex
-----Original Message-----
From: Nick Kostirya [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 03, 2008 9:23 PM
To: General forum
Cc: Alex Rufon
Subject: Re: [Jgeneral] Successful stories
Ç Thu, 3 Jan 2008 17:58:59 +0800
"Alex Rufon" <[EMAIL PROTECTED]> Ô˯ÂÚ:
...
Nick. Just tell us what you think of doing. Maybe we can help. There
are a lot of brilliant people here on the list and maybe we can help
thing move along for you. ;)
Thanks a lot.
I have two goals.
The first one adds up to doing a giant system of linear equations.
The matrix A in the equation Y=A*X is rather sparse, but large.
The size of matrix A is 100 million to 100 million, however, the matrix
row will only have around 100 cells with the values different from zero.
The description of this matrix is located in the file with 3 columns.
The first two columns contain the cell's coordinates, the third column
contains the value. That is, the file contains 10 billion rows. I
haven't decided yet what format is better to store the data in one
file. It can be either fixed length rows, or CSV. What is better from
the J viewpoint ?
That said, the task consists of the following:
1) read the file and generate the matrix
2) do a system of linear equations
3) save the result
In a word, it's simple , but for the newbie in J it's hard to define
the best way for now. Say, the process of "Connection matrix"
generation described here
http://www.jsoftware.com/help/dictionary/samp20.htm doesn't seem quite
optimal to me for the above task solution.
The second task is connected to the factor analysis, however, there are
less data here. So, I believe upon the first task solution I'll acquire
the experience needed to solve the second one.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm