Hi, On Feb 7, 2008 6:06 PM, edward yoon <[EMAIL PROTECTED]> wrote: > Actually, My most hadoop applications are made for numeric analysis. > Therefore, I was tried to make a generalized matrix in/out format. > https://issues.apache.org/jira/browse/HADOOP-2515 > as a Map<row, Map<column, cell>> structure after review the code and > discuss with gary bradski. >
Wouldn't that mean a huge overhead in storage space requirements? Even if it was just a simple Collection, the whole boxing/unboxing business can easily increase the memory requirements 5 fold: Take the netflix dataset as an example. It as 100M entries on a scale of 1...5. Those values fit in 100MB of memory if they are stored as chars. Boxing would add an overhead of at least 4 Bytes per entry to hold a reference to the object, which leaves us with at least 500MB for the same amount of actual data stored. Probably the overhead is even more, as the object itself will probably need some information like a pointer to its class, too. Or did I grossly misunderstand you? Thanks, Markus
