RE: Shuffle In Memory OutOfMemoryError

Andy Sautins Tue, 09 Mar 2010 17:01:32 -0800

  Thanks Christopher.  

  The heap size for reduce tasks is configured to be 640M ( 
mapred.child.java.opts set to -Xmx640m ).


  Andy

-----Original Message-----
From: Christopher Douglas [mailto:chri...@yahoo-inc.com] 
Sent: Tuesday, March 09, 2010 5:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:

> Andy:
> You need to manually apply the patch.
>
> Cheers
>
> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <andy.saut...@returnpath.net 
> >wrote:
>
>>
>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included  
>> in the
>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and  
>> re-ran
>> the same job scenarios.  Running with mapred.reduce.parallel.copies  
>> set to 1
>> and continue to have the same Java heap space error.
>>
>>
>>
>> -----Original Message-----
>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>> Sent: Tuesday, March 09, 2010 12:56 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>
>> This issue has been resolved in
>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>>
>> Please apply the patch
>> M1182-1v20.patch<
>> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch 
>> >
>>
>> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <andy.saut...@returnpath.net
>>> wrote:
>>
>>>
>>> Thanks Ted.  Very helpful.  You are correct that I misunderstood the
>> code
>>> at ReduceTask.java:1535.  I missed the fact that it's in a  
>>> IOException
>> catch
>>> block.  My mistake.  That's what I get for being in a rush.
>>>
>>> For what it's worth I did re-run the job with
>>> mapred.reduce.parallel.copies set with values from 5 all the way  
>>> down to
>> 1.
>>> All failed with the same error:
>>>
>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
>> (ReduceTask.java:1195)
>>>
>>>
>>>  So from that it does seem like something else might be going on,  
>>> yes?
>> I
>>> need to do some more research.
>>>
>>> I appreciate your insights.
>>>
>>> Andy
>>>
>>> -----Original Message-----
>>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>>> Sent: Sunday, March 07, 2010 3:38 PM
>>> To: common-user@hadoop.apache.org
>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>
>>> My observation is based on this call chain:
>>> MapOutputCopier.run() calling copyOutput() calling getMapOutput()  
>>> calling
>>> ramManager.canFitInMemory(decompressedLength)
>>>
>>> Basically ramManager.canFitInMemory() makes decision without  
>>> considering
>>> the
>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of  
>>> total
>> heap
>>> may be used in shuffling if default parameters were used.
>>> Of course, you should check the value for  
>>> mapred.reduce.parallel.copies
>> to
>>> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>>>
>>> About ramManager.unreserve() call, ReduceTask.java from hadoop  
>>> 0.20.2
>> only
>>> has 2731 lines. So I have to guess the location of the code  
>>> snippet you
>>> provided.
>>> I found this around line 1535:
>>>       } catch (IOException ioe) {
>>>         LOG.info("Failed to shuffle from " +
>>> mapOutputLoc.getTaskAttemptId(),
>>>                  ioe);
>>>
>>>         // Inform the ram-manager
>>>         ramManager.closeInMemoryFile(mapOutputLength);
>>>         ramManager.unreserve(mapOutputLength);
>>>
>>>         // Discard the map-output
>>>         try {
>>>           mapOutput.discard();
>>>         } catch (IOException ignored) {
>>>           LOG.info("Failed to discard map-output from " +
>>>                    mapOutputLoc.getTaskAttemptId(), ignored);
>>>         }
>>> Please confirm the line number.
>>>
>>> If we're looking at the same code, I am afraid I don't see how we  
>>> can
>>> improve it. First, I assume IOException shouldn't happen that often.
>>> Second,
>>> mapOutput.discard() just sets:
>>>         data = null;
>>> for in memory case. Even if we call mapOutput.discard() before
>>> ramManager.unreserve(), we don't know when GC would kick in and  
>>> make more
>>> memory available.
>>> Of course, given the large number of map outputs in your system, it
>> became
>>> more likely that the root cause from my reasoning made OOME happen
>> sooner.
>>>
>>> Thanks
>>>
>>>>
>>> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
>> andy.saut...@returnpath.net
>>>> wrote:
>>>
>>>>
>>>>  Ted,
>>>>
>>>>  I'm trying to follow the logic in your mail and I'm not sure I'm
>>>> following.  If you would mind helping me understand I would  
>>>> appreciate
>>> it.
>>>>
>>>>  Looking at the code maxSingleShuffleLimit is only used in  
>>>> determining
>>> if
>>>> the copy _can_ fit into memory:
>>>>
>>>>    boolean canFitInMemory(long requestedSize) {
>>>>       return (requestedSize < Integer.MAX_VALUE &&
>>>>               requestedSize < maxSingleShuffleLimit);
>>>>     }
>>>>
>>>>   It also looks like the RamManager.reserve should wait until  
>>>> memory
>> is
>>>> available so it should hit a memory limit for that reason.
>>>>
>>>>   What does seem a little strange to me is the following (
>>> ReduceTask.java
>>>> starting at 2730 ):
>>>>
>>>>         // Inform the ram-manager
>>>>         ramManager.closeInMemoryFile(mapOutputLength);
>>>>         ramManager.unreserve(mapOutputLength);
>>>>
>>>>         // Discard the map-output
>>>>         try {
>>>>           mapOutput.discard();
>>>>         } catch (IOException ignored) {
>>>>           LOG.info("Failed to discard map-output from " +
>>>>                    mapOutputLoc.getTaskAttemptId(), ignored);
>>>>         }
>>>>         mapOutput = null;
>>>>
>>>>  So to me that looks like the ramManager unreserves the memory  
>>>> before
>>> the
>>>> mapOutput is discarded.  Shouldn't the mapOutput be discarded  
>>>> _before_
>>> the
>>>> ramManager unreserves the memory?  If the memory is unreserved  
>>>> before
>> the
>>>> actual underlying data references are removed then it seems like
>> another
>>>> thread can try to allocate memory ( ReduceTask.java:2730 ) before  
>>>> the
>>>> previous memory is disposed ( mapOutput.discard() ).
>>>>
>>>>  Not sure that makes sense.  One thing to note is that the  
>>>> particular
>>> job
>>>> that is failing does have a good number ( 200k+ ) of map  
>>>> outputs.  The
>>> large
>>>> number of small map outputs may be why we are triggering a problem.
>>>>
>>>>  Thanks again for your thoughts.
>>>>
>>>>  Andy
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Jacob R Rideout [mailto:apa...@jacobrideout.net]
>>>> Sent: Sunday, March 07, 2010 1:21 PM
>>>> To: common-user@hadoop.apache.org
>>>> Cc: Andy Sautins; Ted Yu
>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>
>>>> Ted,
>>>>
>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I might  
>>>> have
>>>> some time to write a patch later this week.
>>>>
>>>> Jacob Rideout
>>>>
>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhih...@gmail.com>  
>>>> wrote:
>>>>> I think there is mismatch (in ReduceTask.java) between:
>>>>>     this.numCopiers = conf.getInt("mapred.reduce.parallel.copies",
>> 5);
>>>>> and:
>>>>>       maxSingleShuffleLimit = (long)(maxSize *
>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
>>>>>
>>>>> because
>>>>>     copiers = new ArrayList<MapOutputCopier>(numCopiers);
>>>>> so the total memory allocated for in-mem shuffle is 1.25 * maxSize
>>>>>
>>>>> A JIRA should be filed to correlate the constant 5 above and
>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout <
>>> apa...@jacobrideout.net
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> We are seeing the following error in our reducers of a particular
>> job:
>>>>>>
>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
>> (ReduceTask.java:1195)
>>>>>>
>>>>>>
>>>>>> After enough reducers fail the entire job fails. This error  
>>>>>> occurs
>>>>>> regardless of whether mapred.compress.map.output is true. We were
>> able
>>>>>> to avoid the issue by reducing
>> mapred.job.shuffle.input.buffer.percent
>>>>>> to 20%. Shouldn't the framework via  
>>>>>> ShuffleRamManager.canFitInMemory
>>>>>> and.ShuffleRamManager.reserve correctly detect the the memory
>>>>>> available for allocation? I would think that with poor  
>>>>>> configuration
>>>>>> settings (and default settings in particular) the job may not  
>>>>>> be as
>>>>>> efficient, but wouldn't die.
>>>>>>
>>>>>> Here is some more context in the logs, I have attached the full
>>>>>> reducer log here: http://gist.github.com/323746
>>>>>>
>>>>>>
>>>>>> 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from
>>>>>> attempt_201003060739_0002_m_000061_0
>>>>>> 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask:
>> Task
>>>>>> attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
>>>>>> attempt_201003060739_0002_m_000202_0
>>>>>> 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
>>>>>> attempt_201003060739_0002_r_000000_0 adding host
>>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
>>>>>> 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from
>> previous
>>>>>> failures
>>>>>> 2010-03-06 07:54:50,223 FATAL  
>>>>>> org.apache.hadoop.mapred.TaskRunner:
>>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy failure :
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>       at
>>>>>>
>>>>
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
>> (ReduceTask.java:1195)
>>>>>>
>>>>>>
>>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped  
>>>>>> MAPREDUCE-1182
>>>>>> would address the issue in 0.20.2, but it did not. Does anyone  
>>>>>> have
>>>>>> any comments or suggestions? Is this a bug I should file a JIRA  
>>>>>> for?
>>>>>>
>>>>>> Jacob Rideout
>>>>>> Return Path
>>>>>>
>>>>>
>>>>
>>>
>>

RE: Shuffle In Memory OutOfMemoryError

Reply via email to