Jason Lowe created MAPREDUCE-5168:
-------------------------------------
Summary: Reducer can OOM during shuffle because on-disk output
stream not released
Key: MAPREDUCE-5168
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5168
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.7
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
If a reducer needs to shuffle a map output to disk, it opens an output stream
and writes the data to disk. However it does not release the reference to the
output stream within the MapOutput, and the output stream can have a 128K
buffer attached to it. If enough of these on-disk outputs are queued up
waiting to be merged, it can cause the reducer to OOM during the shuffle phase.
In one case I saw there were 1200 on-disk outputs queued up to be merged,
leading to an extra 150MB of pressure on the heap due to the output stream
buffers that were no longer necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira