Also, serialization often keeps previously read object references in
memory. Better to use Thrift or Avro to serialize the object.
In my experience serialization is inefficient for large object graphs,
but works fine for smaller graphs (depending on how much memory you
have to work with).
Also for that small of data memcache and mongo may be overkill (unless
the data changes frequently)
Cheers,
Matt
On Oct 13, 2010, at 11:04 AM, "Shi Yu" <[email protected]> wrote:
As a coming-up to the my own question, I think to invoke the JVM in
Hadoop requires much more memory than an ordinary JVM. I found that
instead of serialization the object, maybe I could create a MapFile
as an index to permit lookups by key in Hadoop. I have also compared
the performance of MongoDB and Memcache. I will let you know the
result after I try the MapFile approach.
Shi
On 2010-10-12 21:59, M. C. Srivas wrote:
On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[email protected]> wrote:
Hi,
I want to load a serialized HashMap object in hadoop. The file of
stored
object is 200M. I could read that object efficiently in JAVA by
setting
-Xmx
as 1000M. However, in hadoop I could never load it into memory.
The code
is
very simple (just read the ObjectInputStream) and there is yet no
map/reduce
implemented. I set the mapred.child.java.opts=-Xmx3000M, still
get the
"java.lang.OutOfMemoryError: Java heap space" Could anyone
explain a
little
bit how memory is allocate to JVM in hadoop. Why hadoop takes up
so much
memory? If a program requires 1G memory on a single node, how much
memory
it requires (generally) in Hadoop?
The JVM reserves swap space in advance, at the time of launching the
process. If your swap is too low (or do not have any swap
configured), you
will hit this.
Or, you are on a 32-bit machine, in which case 3G is not possible
in the
JVM.
-Srivas.
Thanks.
Shi
--
iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information of iCrossing. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy all
copies of the original message.