Re: load a serialized object in hadoop

Luke Lu Wed, 13 Oct 2010 14:29:05 -0700

On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu <[email protected]> wrote:
> Hi,  thanks for the advice. I tried with your settings,
> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
> still no effect. Or this is a system variable? Should I export it? How to
> configure it?


HADOOP_CLIENT_OPTS is an environment variable so you should run it as
HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest

if you use sh derivative shells (bash, ksh etc.) prepend env for other shells.

__Luke


> Shi
>
>  java -Xms3G -Xmx3G -classpath
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
> OOloadtest
>
>
> On 2010-10-13 15:28, Luke Lu wrote:
>>
>> On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<[email protected]>  wrote:
>>
>>>
>>> I haven't implemented anything in map/reduce yet for this issue. I just
>>> try
>>> to invoke the same java class using   bin/hadoop  command.  The thing is
>>> a
>>> very simple program could be executed in Java, but not doable in
>>> bin/hadoop
>>> command.
>>>
>>
>> If you are just trying to use bin/hadoop jar your.jar command, your
>> code runs in a local client jvm and mapred.child.java.opts has no
>> effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
>> jar your.jar
>>
>>
>>>
>>> I think if I couldn't get through the first stage, even I had a
>>> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>>>
>>> Best Regards,
>>>
>>> Shi
>>>
>>> On 2010-10-13 14:15, Luke Lu wrote:
>>>
>>>>
>>>> Can you post your mapper/reducer implementation? or are you using
>>>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>>>> the jvm you care about. BTW, what's the hadoop version you're using?
>>>>
>>>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[email protected]>    wrote:
>>>>
>>>>
>>>>>
>>>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>>>> using
>>>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.
>>>>>  I
>>>>> have tried other program in Hadoop with the same settings so the memory
>>>>> is
>>>>> available in my machines.
>>>>>
>>>>>
>>>>> public static void main(String[] args) {
>>>>>   try{
>>>>>             String myFile = "xxx.dat";
>>>>>             FileInputStream fin = new FileInputStream(myFile);
>>>>>             ois = new ObjectInputStream(fin);
>>>>>             margintagMap = ois.readObject();
>>>>>             ois.close();
>>>>>             fin.close();
>>>>>     }catch(Exception e){
>>>>>         //
>>>>>    }
>>>>> }
>>>>>
>>>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[email protected]>
>>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>>>> Hadoop
>>>>>>> requires much more memory than an ordinary JVM.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>>>> is much smaller than the standard jvm default 512M and most users
>>>>>> don't need to increase it. Please post the code reading the object (in
>>>>>> hdfs?) in your tasks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I found that instead of
>>>>>>> serialization the object, maybe I could create a MapFile as an index
>>>>>>> to
>>>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>>>> of
>>>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>>>> MapFile
>>>>>>> approach.
>>>>>>>
>>>>>>> Shi
>>>>>>>
>>>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[email protected]>
>>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>>>> stored
>>>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>>>> setting
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Xmx
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory.
>>>>>>>>>> The
>>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> map/reduce
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still
>>>>>>>>>> get
>>>>>>>>>> the
>>>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone
>>>>>>>>>> explain
>>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> little
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up
>>>>>>>>>> so
>>>>>>>>>> much
>>>>>>>>>> memory?  If a program requires 1G memory on a single node, how
>>>>>>>>>> much
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>>>> process. If your swap is too low (or do not have any swap
>>>>>>>> configured),
>>>>>>>> you
>>>>>>>> will hit this.
>>>>>>>>
>>>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>>>> the
>>>>>>>> JVM.
>>>>>>>>
>>>>>>>> -Srivas.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Shi
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Postdoctoral Scholar
>>>>> Institute for Genomics and Systems Biology
>>>>> Department of Medicine, the University of Chicago
>>>>> Knapp Center for Biomedical Discovery
>>>>> 900 E. 57th St. Room 10148
>>>>> Chicago, IL 60637, US
>>>>> Tel: 773-702-6799
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Postdoctoral Scholar
>>> Institute for Genomics and Systems Biology
>>> Department of Medicine, the University of Chicago
>>> Knapp Center for Biomedical Discovery
>>> 900 E. 57th St. Room 10148
>>> Chicago, IL 60637, US
>>> Tel: 773-702-6799
>>>
>>>
>>>
>
>
>

Re: load a serialized object in hadoop

Reply via email to