If you are running 32 bit JVM and depending on OS, you can't go beyond ~ 3GB.
If you are on windows, 2GB heap is your best bet for 32bit JVM.
Try this:
Edit conf/hadoop-env.sh for
export HADOOP_CLIENT_OPTS="-Xmx3G ${HADOOP_CLIENT_OPTS}"
and now you run your hadoop commands.
-Bharath
From: Shi Yu <[email protected]>
To: [email protected]
Cc:
Sent: Wednesday, October 13, 2010 4:18:17 PM
Subject: Re: load a serialized object in hadoop
Hi, I tried the following five ways:
Approach 1: in command line
HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
Approach 2: I added the hadoop-site.xml file with the following element.
Each time I changed, I stop and restart hadoop on all the nodes.
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>-Xmx4000m</value>
</property>
run the command
$bin/hadoop jar WordCount.jar OOloadtest
Approach 3: I changed like this
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000m</value>
</property>
....
Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest
Approach 4: To make sure, I changed the "m" to numbers, that was
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000000000</value>
</property>
....
Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest
All these four approaches come to the same "Java heap space" error.
java.lang.OutOfMemoryError: Java heap space
at
java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.<init>(StringBuilder.java:68)
at
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)
at
java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)
at
java.io.ObjectInputStream.readString(ObjectInputStream.java:1599)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at java.util.HashMap.readObject(HashMap.java:1028)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at ObjectManager.loadObject(ObjectManager.java:42)
at OOloadtest.main(OOloadtest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Approach 5:
In comparison, I called the Java command directly as follows (there is a
counter showing how much time it costs if the serialized object is
successfully loaded):
$java -Xms3G -Xmx3G -classpath
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
return:
object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)
162millisecond(s)
What was the problem in my command? Where can I find the documentation
about HADOOP_CLIENT_OPTS? Have you tried the same thing and found it works?
Shi
On 2010-10-13 16:28, Luke Lu wrote:
> On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<[email protected]> wrote:
>
>> Hi, thanks for the advice. I tried with your settings,
>> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
>> still no effect. Or this is a system variable? Should I export it? How to
>> configure it?
>>
> HADOOP_CLIENT_OPTS is an environment variable so you should run it as
> HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
>
> if you use sh derivative shells (bash, ksh etc.) prepend env for other shells.
>
> __Luke
>
>
>
>> Shi
>>
>> java -Xms3G -Xmx3G -classpath
>> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
>> OOloadtest
>>
>>
>> On 2010-10-13 15:28, Luke Lu wrote:
>>
>>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<[email protected]> wrote:
>>>
>>>
>>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>>> try
>>>> to invoke the same java class using bin/hadoop command. The thing is
>>>> a
>>>> very simple program could be executed in Java, but not doable in
>>>> bin/hadoop
>>>> command.
>>>>
>>>>
>>> If you are just trying to use bin/hadoop jar your.jar command, your
>>> code runs in a local client jvm and mapred.child.java.opts has no
>>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>>> jar your.jar
>>>
>>>
>>>
>>>> I think if I couldn't get through the first stage, even I had a
>>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>>>
>>>> Best Regards,
>>>>
>>>> Shi
>>>>
>>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>>
>>>>
>>>>> Can you post your mapper/reducer implementation? or are you using
>>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>>
>>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>>> using
>>>>>> java -Xmx1000m , however, when using bin/hadoop -D
>>>>>> mapred.child.java.opts=-Xmx3000M it has heap space not enough error.
>>>>>> I
>>>>>> have tried other program in Hadoop with the same settings so the memory
>>>>>> is
>>>>>> available in my machines.
>>>>>>
>>>>>>
>>>>>> public static void main(String[] args) {
>>>>>> try{
>>>>>> String myFile = "xxx.dat";
>>>>>> FileInputStream fin = new FileInputStream(myFile);
>>>>>> ois = new ObjectInputStream(fin);
>>>>>> margintagMap = ois.readObject();
>>>>>> ois.close();
>>>>>> fin.close();
>>>>>> }catch(Exception e){
>>>>>> //
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>>> Hadoop
>>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>>> don't need to increase it. Please post the code reading the object (in
>>>>>>> hdfs?) in your tasks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I found that instead of
>>>>>>>> serialization the object, maybe I could create a MapFile as an index
>>>>>>>> to
>>>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>>>> of
>>>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>>>> MapFile
>>>>>>>> approach.
>>>>>>>>
>>>>>>>> Shi
>>>>>>>>
>>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>>>> stored
>>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>>> setting
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> -Xmx
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> as 1000M. However, in hadoop I could never load it into memory.
>>>>>>>>>>> The
>>>>>>>>>>> code
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> map/reduce
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> implemented. I set the mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>>> get
>>>>>>>>>>> the
>>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space" Could anyone
>>>>>>>>>>> explain
>>>>>>>>>>> a
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> little
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>>> so
>>>>>>>>>>> much
>>>>>>>>>>> memory? If a program requires 1G memory on a single node, how
>>>>>>>>>>> much
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> memory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>>> configured),
>>>>>>>>> you
>>>>>>>>> will hit this.
>>>>>>>>>
>>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>>>> the
>>>>>>>>> JVM.
>>>>>>>>>
>>>>>>>>> -Srivas.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> Shi
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> Postdoctoral Scholar
>>>>>> Institute for Genomics and Systems Biology
>>>>>> Department of Medicine, the University of Chicago
>>>>>> Knapp Center for Biomedical Discovery
>>>>>> 900 E. 57th St. Room 10148
>>>>>> Chicago, IL 60637, US
>>>>>> Tel: 773-702-6799
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Postdoctoral Scholar
>>>> Institute for Genomics and Systems Biology
>>>> Department of Medicine, the University of Chicago
>>>> Knapp Center for Biomedical Discovery
>>>> 900 E. 57th St. Room 10148
>>>> Chicago, IL 60637, US
>>>> Tel: 773-702-6799
>>>>
>>>>
>>>>
>>>>
>>
>>
>>