Hi, I though I may get some professional opinion on this issue (in hadoop 0.20.2) which was titled 'Shuffle In Memory OutOfMemoryError' in common-user mailing list. If someone can elaborate when reader.close() is supposed to be called, that would be great.
------ Thanks to Andy for the log he provided. You can see from the log below that size increased steadily from 341535057 to 408181692, approaching maxSize. Then OOME: 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve: pos=start requestedSize=3893000 size=341535057 numPendingRequests=0 maxSize=417601952 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve: pos=end requestedSize=3893000 size=345428057 numPendingRequests=0 maxSize=417601952 ... 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask: reserve: pos=end requestedSize=635753 size=408181692 numPendingRequests=0 maxSize=417601952 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201003101826_0001_r_000004_0: Failed fetch #1 from attempt_201003101826_0001_m_000875_0 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201003101826_0001_r_000004_0 adding host hd17.dfs.returnpath.net to penalty box, next contact in 4 seconds 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred. > > ReduceTask: attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from > previous failures > 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner: > attempt_201003101826_0001_r_000004_0 : Map output copy failure : > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200) > > Looking at the call to unreserve() in ReduceTask, two were for IOException > and the other was for Sanity check (line 1557). Meaning they wouldn't be > called in normal execution path. > > I see one call in IFile.InMemoryReader close() method: > // Inform the RamManager > ramManager.unreserve(bufferSize); > > And InMemoryReader is used in createInMemorySegments(): > Reader<K, V> reader = > new InMemoryReader<K, V>(ramManager, mo.mapAttemptId, > mo.data, 0, mo.data.length); > > But I don't see reader.close() in ReduceTask file. >
