> Wouldn't that mean a huge overhead in storage space requirements? Not exactly, I'm willing to make the sacrifice in order to get a super computer even if overhead. And I considered various methods for storing the vectors and matrices which contain dimensioned elements and are generic computable.
On 2/7/08, Markus Weimer <[EMAIL PROTECTED]> wrote: > Hi, > On Feb 7, 2008 6:06 PM, edward yoon <[EMAIL PROTECTED]> wrote: > > > Actually, My most hadoop applications are made for numeric analysis. > > Therefore, I was tried to make a generalized matrix in/out format. > > https://issues.apache.org/jira/browse/HADOOP-2515 > > as a Map<row, Map<column, cell>> structure after review the code and > > discuss with gary bradski. > > > > Wouldn't that mean a huge overhead in storage space requirements? Even if it > was just a simple Collection, the whole boxing/unboxing business can easily > increase the memory requirements 5 fold: Take the netflix dataset as an > example. It as 100M entries on a scale of 1...5. Those values fit in 100MB > of memory if they are stored as chars. Boxing would add an overhead of at > least 4 Bytes per entry to hold a reference to the object, which leaves us > with at least 500MB for the same amount of actual data stored. Probably the > overhead is even more, as the object itself will probably need some > information like a pointer to its class, too. > > Or did I grossly misunderstand you? > > Thanks, > > Markus > -- B. Regards, Edward yoon @ NHN, corp.
