[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498304#comment-13498304
 ] 

Jason Lowe commented on MAPREDUCE-4801:
---------------------------------------

I believe this is caused by the behavior of reducers during the shuffle when 
they receive the shuffle header containing the size and then the MergeManager 
decides that's too much data to receive right now.  In that case it doesn't 
read the subsequent map data and just closes the socket.  That leads to 
IOExceptions when the ShuffleHandler tries to push the data to the closed 
socket.
                
> ShuffleHandler can generate large logs due to prematurely closed channels
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4801
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Priority: Critical
>
> We ran into an instance where many nodes on a cluster ran out of disk space 
> because the nodemanager logs were huge.  Examining the logs showed many, many 
> shuffle errors due to either ClosedChannelException or IOException from 
> "Connection reset by peer" or "Broken pipe".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to