Re: load a serialized object in hadoop

Luke Lu Wed, 13 Oct 2010 13:29:01 -0700

On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu <[email protected]> wrote:
> I haven't implemented anything in map/reduce yet for this issue. I just try
> to invoke the same java class using   bin/hadoop  command.  The thing is a
> very simple program could be executed in Java, but not doable in bin/hadoop
> command.


If you are just trying to use bin/hadoop jar your.jar command, your
code runs in a local client jvm and mapred.child.java.opts has no
effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
jar your.jar

> I think if I couldn't get through the first stage, even I had a
> map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
>
> Best Regards,
>
> Shi
>
> On 2010-10-13 14:15, Luke Lu wrote:
>>
>> Can you post your mapper/reducer implementation? or are you using
>> hadoop streaming? for which mapred.child.java.opts doesn't apply to
>> the jvm you care about. BTW, what's the hadoop version you're using?
>>
>> On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<[email protected]>  wrote:
>>
>>>
>>> Here is my code. There is no Map/Reduce in it. I could run this code
>>> using
>>> java -Xmx1000m ,  however, when using  bin/hadoop  -D
>>> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
>>> have tried other program in Hadoop with the same settings so the memory
>>> is
>>> available in my machines.
>>>
>>>
>>> public static void main(String[] args) {
>>>   try{
>>>             String myFile = "xxx.dat";
>>>             FileInputStream fin = new FileInputStream(myFile);
>>>             ois = new ObjectInputStream(fin);
>>>             margintagMap = ois.readObject();
>>>             ois.close();
>>>             fin.close();
>>>     }catch(Exception e){
>>>         //
>>>    }
>>> }
>>>
>>> On 2010-10-13 13:30, Luke Lu wrote:
>>>
>>>>
>>>> On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<[email protected]>    wrote:
>>>>
>>>>
>>>>>
>>>>> As a coming-up to the my own question, I think to invoke the JVM in
>>>>> Hadoop
>>>>> requires much more memory than an ordinary JVM.
>>>>>
>>>>>
>>>>
>>>> That's simply not true. The default mapreduce task Xmx is 200M, which
>>>> is much smaller than the standard jvm default 512M and most users
>>>> don't need to increase it. Please post the code reading the object (in
>>>> hdfs?) in your tasks.
>>>>
>>>>
>>>>
>>>>>
>>>>> I found that instead of
>>>>> serialization the object, maybe I could create a MapFile as an index to
>>>>> permit lookups by key in Hadoop. I have also compared the performance
>>>>> of
>>>>> MongoDB and Memcache. I will let you know the result after I try the
>>>>> MapFile
>>>>> approach.
>>>>>
>>>>> Shi
>>>>>
>>>>> On 2010-10-12 21:59, M. C. Srivas wrote:
>>>>>
>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<[email protected]>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I want to load a serialized HashMap object in hadoop. The file of
>>>>>>>> stored
>>>>>>>> object is 200M. I could read that object efficiently in JAVA by
>>>>>>>> setting
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> -Xmx
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> as 1000M.  However, in hadoop I could never load it into memory. The
>>>>>>>> code
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> is
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> very simple (just read the ObjectInputStream) and there is yet no
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> map/reduce
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
>>>>>>>> the
>>>>>>>> "java.lang.OutOfMemoryError: Java heap space"  Could anyone explain
>>>>>>>> a
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> little
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
>>>>>>>> much
>>>>>>>> memory?  If a program requires 1G memory on a single node, how much
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> memory
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> it requires (generally) in Hadoop?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> The JVM reserves swap space in advance, at the time of launching the
>>>>>> process. If your swap is too low (or do not have any swap configured),
>>>>>> you
>>>>>> will hit this.
>>>>>>
>>>>>> Or, you are on a 32-bit machine, in which case 3G is not possible in
>>>>>> the
>>>>>> JVM.
>>>>>>
>>>>>> -Srivas.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Shi
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Postdoctoral Scholar
>>> Institute for Genomics and Systems Biology
>>> Department of Medicine, the University of Chicago
>>> Knapp Center for Biomedical Discovery
>>> 900 E. 57th St. Room 10148
>>> Chicago, IL 60637, US
>>> Tel: 773-702-6799
>>>
>>>
>>>
>
>
> --
> Postdoctoral Scholar
> Institute for Genomics and Systems Biology
> Department of Medicine, the University of Chicago
> Knapp Center for Biomedical Discovery
> 900 E. 57th St. Room 10148
> Chicago, IL 60637, US
> Tel: 773-702-6799
>
>

Re: load a serialized object in hadoop

Reply via email to