Thanks Christopher. The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ).
Andy -----Original Message----- From: Christopher Douglas [mailto:chri...@yahoo-inc.com] Sent: Tuesday, March 09, 2010 5:19 PM To: common-user@hadoop.apache.org Subject: Re: Shuffle In Memory OutOfMemoryError No, MR-1182 is included in 0.20.2 What heap size have you set for your reduce tasks? -C Sent from my iPhone On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > Andy: > You need to manually apply the patch. > > Cheers > > On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <andy.saut...@returnpath.net > >wrote: > >> >> Thanks Ted. My understanding is that MAPREDUCE-1182 is included >> in the >> 0.20.2 release. We upgraded our cluster to 0.20.2 this weekend and >> re-ran >> the same job scenarios. Running with mapred.reduce.parallel.copies >> set to 1 >> and continue to have the same Java heap space error. >> >> >> >> -----Original Message----- >> From: Ted Yu [mailto:yuzhih...@gmail.com] >> Sent: Tuesday, March 09, 2010 12:56 PM >> To: common-user@hadoop.apache.org >> Subject: Re: Shuffle In Memory OutOfMemoryError >> >> This issue has been resolved in >> http://issues.apache.org/jira/browse/MAPREDUCE-1182 >> >> Please apply the patch >> M1182-1v20.patch< >> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch >> > >> >> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <andy.saut...@returnpath.net >>> wrote: >> >>> >>> Thanks Ted. Very helpful. You are correct that I misunderstood the >> code >>> at ReduceTask.java:1535. I missed the fact that it's in a >>> IOException >> catch >>> block. My mistake. That's what I get for being in a rush. >>> >>> For what it's worth I did re-run the job with >>> mapred.reduce.parallel.copies set with values from 5 all the way >>> down to >> 1. >>> All failed with the same error: >>> >>> Error: java.lang.OutOfMemoryError: Java heap space >>> at >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) >>> at >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) >>> at >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.copyOutput(ReduceTask.java:1261) >>> at >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run >> (ReduceTask.java:1195) >>> >>> >>> So from that it does seem like something else might be going on, >>> yes? >> I >>> need to do some more research. >>> >>> I appreciate your insights. >>> >>> Andy >>> >>> -----Original Message----- >>> From: Ted Yu [mailto:yuzhih...@gmail.com] >>> Sent: Sunday, March 07, 2010 3:38 PM >>> To: common-user@hadoop.apache.org >>> Subject: Re: Shuffle In Memory OutOfMemoryError >>> >>> My observation is based on this call chain: >>> MapOutputCopier.run() calling copyOutput() calling getMapOutput() >>> calling >>> ramManager.canFitInMemory(decompressedLength) >>> >>> Basically ramManager.canFitInMemory() makes decision without >>> considering >>> the >>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of >>> total >> heap >>> may be used in shuffling if default parameters were used. >>> Of course, you should check the value for >>> mapred.reduce.parallel.copies >> to >>> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply. >>> >>> About ramManager.unreserve() call, ReduceTask.java from hadoop >>> 0.20.2 >> only >>> has 2731 lines. So I have to guess the location of the code >>> snippet you >>> provided. >>> I found this around line 1535: >>> } catch (IOException ioe) { >>> LOG.info("Failed to shuffle from " + >>> mapOutputLoc.getTaskAttemptId(), >>> ioe); >>> >>> // Inform the ram-manager >>> ramManager.closeInMemoryFile(mapOutputLength); >>> ramManager.unreserve(mapOutputLength); >>> >>> // Discard the map-output >>> try { >>> mapOutput.discard(); >>> } catch (IOException ignored) { >>> LOG.info("Failed to discard map-output from " + >>> mapOutputLoc.getTaskAttemptId(), ignored); >>> } >>> Please confirm the line number. >>> >>> If we're looking at the same code, I am afraid I don't see how we >>> can >>> improve it. First, I assume IOException shouldn't happen that often. >>> Second, >>> mapOutput.discard() just sets: >>> data = null; >>> for in memory case. Even if we call mapOutput.discard() before >>> ramManager.unreserve(), we don't know when GC would kick in and >>> make more >>> memory available. >>> Of course, given the large number of map outputs in your system, it >> became >>> more likely that the root cause from my reasoning made OOME happen >> sooner. >>> >>> Thanks >>> >>>> >>> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins < >> andy.saut...@returnpath.net >>>> wrote: >>> >>>> >>>> Ted, >>>> >>>> I'm trying to follow the logic in your mail and I'm not sure I'm >>>> following. If you would mind helping me understand I would >>>> appreciate >>> it. >>>> >>>> Looking at the code maxSingleShuffleLimit is only used in >>>> determining >>> if >>>> the copy _can_ fit into memory: >>>> >>>> boolean canFitInMemory(long requestedSize) { >>>> return (requestedSize < Integer.MAX_VALUE && >>>> requestedSize < maxSingleShuffleLimit); >>>> } >>>> >>>> It also looks like the RamManager.reserve should wait until >>>> memory >> is >>>> available so it should hit a memory limit for that reason. >>>> >>>> What does seem a little strange to me is the following ( >>> ReduceTask.java >>>> starting at 2730 ): >>>> >>>> // Inform the ram-manager >>>> ramManager.closeInMemoryFile(mapOutputLength); >>>> ramManager.unreserve(mapOutputLength); >>>> >>>> // Discard the map-output >>>> try { >>>> mapOutput.discard(); >>>> } catch (IOException ignored) { >>>> LOG.info("Failed to discard map-output from " + >>>> mapOutputLoc.getTaskAttemptId(), ignored); >>>> } >>>> mapOutput = null; >>>> >>>> So to me that looks like the ramManager unreserves the memory >>>> before >>> the >>>> mapOutput is discarded. Shouldn't the mapOutput be discarded >>>> _before_ >>> the >>>> ramManager unreserves the memory? If the memory is unreserved >>>> before >> the >>>> actual underlying data references are removed then it seems like >> another >>>> thread can try to allocate memory ( ReduceTask.java:2730 ) before >>>> the >>>> previous memory is disposed ( mapOutput.discard() ). >>>> >>>> Not sure that makes sense. One thing to note is that the >>>> particular >>> job >>>> that is failing does have a good number ( 200k+ ) of map >>>> outputs. The >>> large >>>> number of small map outputs may be why we are triggering a problem. >>>> >>>> Thanks again for your thoughts. >>>> >>>> Andy >>>> >>>> >>>> -----Original Message----- >>>> From: Jacob R Rideout [mailto:apa...@jacobrideout.net] >>>> Sent: Sunday, March 07, 2010 1:21 PM >>>> To: common-user@hadoop.apache.org >>>> Cc: Andy Sautins; Ted Yu >>>> Subject: Re: Shuffle In Memory OutOfMemoryError >>>> >>>> Ted, >>>> >>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I might >>>> have >>>> some time to write a patch later this week. >>>> >>>> Jacob Rideout >>>> >>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhih...@gmail.com> >>>> wrote: >>>>> I think there is mismatch (in ReduceTask.java) between: >>>>> this.numCopiers = conf.getInt("mapred.reduce.parallel.copies", >> 5); >>>>> and: >>>>> maxSingleShuffleLimit = (long)(maxSize * >>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); >>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f >>>>> >>>>> because >>>>> copiers = new ArrayList<MapOutputCopier>(numCopiers); >>>>> so the total memory allocated for in-mem shuffle is 1.25 * maxSize >>>>> >>>>> A JIRA should be filed to correlate the constant 5 above and >>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION. >>>>> >>>>> Cheers >>>>> >>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout < >>> apa...@jacobrideout.net >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> We are seeing the following error in our reducers of a particular >> job: >>>>>> >>>>>> Error: java.lang.OutOfMemoryError: Java heap space >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.copyOutput(ReduceTask.java:1261) >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run >> (ReduceTask.java:1195) >>>>>> >>>>>> >>>>>> After enough reducers fail the entire job fails. This error >>>>>> occurs >>>>>> regardless of whether mapred.compress.map.output is true. We were >> able >>>>>> to avoid the issue by reducing >> mapred.job.shuffle.input.buffer.percent >>>>>> to 20%. Shouldn't the framework via >>>>>> ShuffleRamManager.canFitInMemory >>>>>> and.ShuffleRamManager.reserve correctly detect the the memory >>>>>> available for allocation? I would think that with poor >>>>>> configuration >>>>>> settings (and default settings in particular) the job may not >>>>>> be as >>>>>> efficient, but wouldn't die. >>>>>> >>>>>> Here is some more context in the logs, I have attached the full >>>>>> reducer log here: http://gist.github.com/323746 >>>>>> >>>>>> >>>>>> 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from >>>>>> attempt_201003060739_0002_m_000061_0 >>>>>> 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: >> Task >>>>>> attempt_201003060739_0002_r_000000_0: Failed fetch #1 from >>>>>> attempt_201003060739_0002_m_000202_0 >>>>>> 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask: >>>>>> attempt_201003060739_0002_r_000000_0 adding host >>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds >>>>>> 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask: >>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from >> previous >>>>>> failures >>>>>> 2010-03-06 07:54:50,223 FATAL >>>>>> org.apache.hadoop.mapred.TaskRunner: >>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy failure : >>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408) >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >> $MapOutputCopier.copyOutput(ReduceTask.java:1261) >>>>>> at >>>>>> >>>> >>> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run >> (ReduceTask.java:1195) >>>>>> >>>>>> >>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped >>>>>> MAPREDUCE-1182 >>>>>> would address the issue in 0.20.2, but it did not. Does anyone >>>>>> have >>>>>> any comments or suggestions? Is this a bug I should file a JIRA >>>>>> for? >>>>>> >>>>>> Jacob Rideout >>>>>> Return Path >>>>>> >>>>> >>>> >>> >>