Hi,
On Feb 7, 2008 6:06 PM, edward yoon <[EMAIL PROTECTED]> wrote:

> Actually, My most hadoop applications are made for numeric analysis.
> Therefore, I was tried to make a generalized matrix in/out format.
> https://issues.apache.org/jira/browse/HADOOP-2515
> as a Map<row, Map<column, cell>> structure after review the code and
> discuss with gary bradski.
>

Wouldn't that mean a huge overhead in storage space requirements? Even if it
was just a simple Collection, the whole boxing/unboxing business can easily
increase the memory requirements 5 fold: Take the netflix dataset as an
example. It as 100M entries on a scale of 1...5. Those values fit in 100MB
of memory if they are stored as chars. Boxing would add an overhead of at
least 4 Bytes per entry to hold a reference to the object, which leaves us
with at least 500MB for the same amount of actual data stored. Probably the
overhead is even more, as the object itself will probably need some
information like a pointer to its class, too.

Or did I grossly misunderstand you?

Thanks,

Markus

Reply via email to