Re: load a serialized object in hadoop

Shi Yu Wed, 13 Oct 2010 12:27:11 -0700

I haven't implemented anything in map/reduce yet for this issue. I justtry to invoke the same java class using bin/hadoop command. Thething is a very simple program could be executed in Java, but not doablein bin/hadoop command. I think if I couldn't get through the firststage, even I had a map/reduce program it would also fail. I am usingHadoop 0.19.2. Thanks.


Best Regards,


Shi

On 2010-10-13 14:15, Luke Lu wrote:

Can you post your mapper/reducer implementation? or are you using
hadoop streaming? for which mapred.child.java.opts doesn't apply to
the jvm you care about. BTW, what's the hadoop version you're using?

On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[email protected]>  wrote:

Here is my code. There is no Map/Reduce in it. I could run this code using
java -Xmx1000m ,  however, when using  bin/hadoop  -D
mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
have tried other program in Hadoop with the same settings so the memory is
available in my machines.


public static void main(String[] args) {
   try{
             String myFile = "xxx.dat";
             FileInputStream fin = new FileInputStream(myFile);
             ois = new ObjectInputStream(fin);
             margintagMap = ois.readObject();
             ois.close();
             fin.close();
     }catch(Exception e){
         //
    }
}

On 2010-10-13 13:30, Luke Lu wrote:

On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[email protected]>    wrote:

As a coming-up to the my own question, I think to invoke the JVM in
Hadoop
requires much more memory than an ordinary JVM.

That's simply not true. The default mapreduce task Xmx is 200M, which
is much smaller than the standard jvm default 512M and most users
don't need to increase it. Please post the code reading the object (in
hdfs?) in your tasks.

I found that instead of
serialization the object, maybe I could create a MapFile as an index to
permit lookups by key in Hadoop. I have also compared the performance of
MongoDB and Memcache. I will let you know the result after I try the
MapFile
approach.

Shi

On 2010-10-12 21:59, M. C. Srivas wrote:

On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[email protected]>      wrote:

Hi,

I want to load a serialized HashMap object in hadoop. The file of
stored
object is 200M. I could read that object efficiently in JAVA by
setting

-Xmx

as 1000M.  However, in hadoop I could never load it into memory. The
code

is

very simple (just read the ObjectInputStream) and there is yet no

map/reduce

implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
the
"java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a

little

bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
much
memory?  If a program requires 1G memory on a single node, how much

memory

it requires (generally) in Hadoop?

The JVM reserves swap space in advance, at the time of launching the
process. If your swap is too low (or do not have any swap configured),
you
will hit this.

Or, you are on a 32-bit machine, in which case 3G is not possible in the
JVM.

-Srivas.

Thanks.

Shi

--


--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799



--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Re: load a serialized object in hadoop

Reply via email to