[ 
https://issues.apache.org/jira/browse/HADOOP-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600633#action_12600633
 ] 

ckunz edited comment on HADOOP-3462 at 5/28/08 3:29 PM:
-----------------------------------------------------------------

We have a long running job with multiple waves of maps. It occasionally occurs 
that a relatively high number of TaskTrackers on nodes with failing disks 
repeatedly fail, and their number is high enough to let reduce tasks run out of 
attempts before these TaskTrackers get blacklisted.

Wouldn't it be better to only count reduce task failures against number of 
attempts when the reduce application is actually running, and not already 
during shuffling?

      was (Author: ckunz):
    We have a long running job with multiple waves of maps. It occasionally 
occurs that a relatively high number of TaskTrackers on nodes with failing 
disks repeatedly fail, and there number is high enough to let reduce tasks run 
out of attempts before these TaskTrackers get blacklisted.

Wouldn't it be better to only count reduce task failures against number of 
attempts when the reduce application is actually running, and not already 
during shuffling?
  
> reduce task failures during shuffling should not count against number of 
> retry attempts
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3462
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3462
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.3
>            Reporter: Christian Kunz
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to