[ 
https://issues.apache.org/jira/browse/HADOOP-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616834#action_12616834
 ] 

Amareshwari Sriramadasu commented on HADOOP-3462:
-------------------------------------------------

Currently, the task attempts failed due to factors such as disk errors, network 
errors, shuffle errors, fs errors etc. will mark the attempt as FAILED. If 
there are four such failures on a task tracker, the tracker will be 
blacklisted.  And if the attempts of a tip failed four times due to such 
factors, that will kill the tip thereby kill the job. 

To avoid such failures killing the job, the attempts should be marked something 
like FAILED_INTERNAL and these failures should be considered for blacklasting 
the trackers, but not kill the job.

Thoughts?

> reduce task failures during shuffling should not count against number of 
> retry attempts
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3462
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3462
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.3
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to