Re: Hadoop - non disk based sorting?

Chandraprakash Bhagtani Fri, 02 Dec 2011 05:56:08 -0800

Hi,

Which hadoop version are you running? Your error looks similar to
https://issues.apache.org/jira/browse/MAPREDUCE-1182



On Fri, Dec 2, 2011 at 11:17 AM, Ravi teja ch n v
<raviteja.c...@huawei.com>wrote:

> Hi Bobby,
>
>  You are right that the Map outputs when copied will be spilled to the
> disk, but in case the the reducer cannot accomodate the copy inmemory.
> (shuffleInMemory and shuffleToDisk are chosen by rammanager based on
> inmemory size)
>
>  But according to the stack trace provided by Mingxi,
>
>  
> >org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
>
> The problem has occured,after the inmemory copy was chosen,
>
> Regards,
> Ravi Teja
> ________________________________________
> From: Robert Evans [ev...@yahoo-inc.com]
> Sent: 01 December 2011 21:44:50
> To: common-dev@hadoop.apache.org
> Subject: Re: Hadoop - non disk based sorting?
>
> Mingxi,
>
> My understanding was that just like with the maps that when a reducer's in
> memory buffer fills up it too will spill to disk as part of the sort.  In
> fact I think it uses the exact same code for doing the sort as the map
> does.  There may be an issue where your sort buffer is some how too large
> for the amount of heap that you requested as part of the
> mapred.child.java.opts.  I have personally run a reduce that took in 300GB
> of data, which it successfully sorted, to test this very thing.  And no the
> box did not have 300 GB of RAM.
>
> --Bobby Evans
>
> On 12/1/11 4:12 AM, "Ravi teja ch n v" <raviteja.c...@huawei.com> wrote:
>
> Hi Mingxi ,
>
> >So, why when map outputs are huge, reducer will not able to copy them?
>
> The Reducer  will copy the Map output into its inmemory buffer. When the
> Reducer JVM doesnt have enough memory to accomodate the
> Map output, then it leads to OutOfMemoryException.
>
> >Can you please kindly explain what's the function of
> mapred.child.java.opts? how does it relate to copy?
>
> The Maps and Reducers will be launched in separate child JVMs launched at
> the Tasktrackers.
> When the Tasktracker launches the Map or Reduce JVMs, it uses the
> mapred.child.java.opts as JVM arguments for the new child JVMs.
>
> Regards,
> Ravi Teja
> ________________________________________
> From: Mingxi Wu [mingxi...@turn.com]
> Sent: 01 December 2011 12:37:54
> To: common-dev@hadoop.apache.org
> Subject: RE: Hadoop - non disk based sorting?
>
> Thanks Ravi.
>
> So, why when map outputs are huge, reducer will not able to copy them?
>
> Can you please kindly explain what's the function of
> mapred.child.java.opts? how does it relate to copy?
>
> Thank you,
>
> Mingxi
>
> -----Original Message-----
> From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
> Sent: Tuesday, November 29, 2011 9:46 PM
> To: common-dev@hadoop.apache.org
> Subject: RE: Hadoop - non disk based sorting?
>
> Hi Mingxi,
>
> From your stacktrace, I understand that the OutOfMemoryError has actually
> occured while copying the MapOutputs, not while sorting them.
>
> Since your Mapoutputs are huge and your reducer does have enough heap
> memory, you got the problem.
> When you have made the reducers to 200, your Map outputs have got
> partitioned amoung 200 reducers, so you didnt get this problem.
>
> By setting the max memory of your reducer with mapred.child.java.opts, you
> can get over this problem.
>
> Regards,
> Ravi teja
>
>
> ________________________________________
> From: Mingxi Wu [mingxi...@turn.com]
> Sent: 30 November 2011 05:14:49
> To: common-dev@hadoop.apache.org
> Subject: Hadoop - non disk based sorting?
>
> Hi,
>
> I have a question regarding the shuffle phase of reducer.
>
> It appears when there are large map output (in my case, 5 billion
> records), I will have out of memory Error like below.
>
> Error: java.lang.OutOfMemoryError: Java heap space at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)
>
> However, I thought the shuffling phase is using disk-based sort, which is
> not constraint by memory.
> So, why will user run into this outofmemory error? After I increased my
> number of reducers from 100 to 200, the problem went away.
>
> Any input regarding this memory issue would be appreciated!
>
> Thanks,
>
> Mingxi
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,
Nokia India Pvt. Ltd.

Re: Hadoop - non disk based sorting?

Reply via email to