[
https://issues.apache.org/jira/browse/HADOOP-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595134#action_12595134
]
Arun C Murthy commented on HADOOP-2095:
---------------------------------------
Somre refinements after further discusssions...
Prologue: Use a compressed stream for the intermediate sequence files i.e.
map-outputs, not {record|block}-compressed sequence files. This cuts down no.
of decompressors required at the reducers. Add headers to ensure that the
reducer can query each map to find out the exact compressed and uncompressed
sizes _before_ it copies the data.
1) Compute the maximum number of usable decompressors.
2) Download map-outputs until ramfs is full, or we have reached the
decompressors' limit.
3) Trigger InMemFSMergeThread to start the merge. (Currently a new
InMemFSMergeThread created for every triggered merge, I plan to fix it so that
we use one and only one thread.)
4) If ramfs is full, _suspend_ the shuffle; else keep shuffling into memory.
Essentially the idea is that we pack as much into memory before we initiate the
merge, this saves us trips to disk (the output of the merge) which, as Devaraj
has shown
(http://issues.apache.org/jira/browse/HADOOP-3297?focusedCommentId=12592816#action_12592816)
leads to much better overall performance.
Of course the above discussion is valid _iff_ we are dealing with small
map-outputs.
For the contra-case where map-outputs are large, we need a threshold which says:
If map-output > 10%, then shuffle into memory if possible, else shuffle to
disk. This ensures that we do not needlessly throttle shuffle if map-outputs
are too big to fit into ramfs.
Thoughts?
----
Before I jump in and make changes I'm currently trying to simulate the above
behaviour and publish some numbers... watch this space.
> Reducer failed due to Out ofMemory
> ----------------------------------
>
> Key: HADOOP-2095
> URL: https://issues.apache.org/jira/browse/HADOOP-2095
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.15.0
> Reporter: Runping Qi
> Assignee: Arun C Murthy
> Attachments: HADOOP-2095_CompressedBytesWithCodecPool.patch,
> HADOOP-2095_debug.patch
>
>
> One of the reducers of my job failed with the following exceptions.
> The failure caused the whole job fail eventually.
> Java heapsize was 768MB and sort.io.mb was 140.
> 2007-10-23 19:24:06,100 WARN org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Intermediate Merge of the inmemory files
> threw an exception: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.compress.DecompressorStream.(DecompressorStream.java:43)
> at
> org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:71)
> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1345)
> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1231)
> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1154)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2726)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2543)
> at
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2297)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1311)
> 2007-10-23 19:24:06,102 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 done copying
> task_200710231912_0001_m_001428_0 output .
> 2007-10-23 19:24:06,185 INFO org.apache.hadoop.fs.FileSystem: Initialized
> InMemoryFileSystem:
> ramfs://mapoutput31952838/task_200710231912_0001_r_000020_2/map_1423.out-0 of
> size (in bytes): 209715200
> 2007-10-23 19:24:06,193 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,193 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001215_0
> output from xxx
> 2007-10-23 19:24:06,188 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001211_0
> output from xxx
> 2007-10-23 19:24:06,185 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.close(InMemoryFileSystem.java:161)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:312)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> at
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:253)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:713)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,199 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001247_0
> output from .
> 2007-10-23 19:24:06,200 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,204 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001422_0
> output from .
> 2007-10-23 19:24:06,207 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,209 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001278_0
> output from .
> 2007-10-23 19:24:06,198 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.IOException: task_200710231912_0001_r_000020_2The reduce copier failed
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> 2007-10-23 19:24:06,198 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,231 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001531_0
> output from .
> 2007-10-23 19:24:06,197 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,237 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200710231912_0001_r_000020_2 Copying task_200710231912_0001_m_001227_0
> output from .
> 2007-10-23 19:24:06,196 ERROR org.apache.hadoop.mapred.ReduceTask: Map output
> copy failure: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> at
> org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.