[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717531#comment-13717531
 ] 

Jason Lowe commented on MAPREDUCE-5251:
---------------------------------------

I see reportLocalError is now throwing UnknownHostException.  Unfortunately 
since that is an IOException, if it ever does do that it will end up catching 
that in the outer try-catch block in copyMapOutput and a map attempt wil be 
blamed for it.

Also now that I think of it, we arguably should be incrementing the ioErrs 
counter before calling reportLocalError since this is an I/O error during the 
shuffle that prevented a successful map output transfer.
                
> Reducer should not implicate map attempt if it has insufficient space to 
> fetch map output
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5251
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7, 2.0.4-alpha
>            Reporter: Jason Lowe
>            Assignee: Ashwin Shankar
>         Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
> MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt
>
>
> A job can fail if a reducer happens to run on a node with insufficient space 
> to hold a map attempt's output.  The reducer keeps reporting the map attempt 
> as bad, and if the map attempt ends up being re-launched too many times 
> before the reducer decides maybe it is the real problem the job can fail.
> In that scenario it would be better to re-launch the reduce attempt and 
> hopefully it will run on another node that has sufficient space to complete 
> the shuffle.  Reporting the map attempt is bad and relaunching the map task 
> doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to