Jason Lowe created MAPREDUCE-5168:
-------------------------------------

             Summary: Reducer can OOM during shuffle because on-disk output 
stream not released
                 Key: MAPREDUCE-5168
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5168
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 0.23.7
            Reporter: Jason Lowe
            Assignee: Jason Lowe
            Priority: Critical


If a reducer needs to shuffle a map output to disk, it opens an output stream 
and writes the data to disk.  However it does not release the reference to the 
output stream within the MapOutput, and the output stream can have a 128K 
buffer attached to it.  If enough of these on-disk outputs are queued up 
waiting to be merged, it can cause the reducer to OOM during the shuffle phase. 
 In one case I saw there were 1200 on-disk outputs queued up to be merged, 
leading to an extra 150MB of pressure on the heap due to the output stream 
buffers that were no longer necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to