[
https://issues.apache.org/jira/browse/HADOOP-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600633#action_12600633
]
ckunz edited comment on HADOOP-3462 at 5/28/08 3:29 PM:
-----------------------------------------------------------------
We have a long running job with multiple waves of maps. It occasionally occurs
that a relatively high number of TaskTrackers on nodes with failing disks
repeatedly fail, and their number is high enough to let reduce tasks run out of
attempts before these TaskTrackers get blacklisted.
Wouldn't it be better to only count reduce task failures against number of
attempts when the reduce application is actually running, and not already
during shuffling?
was (Author: ckunz):
We have a long running job with multiple waves of maps. It occasionally
occurs that a relatively high number of TaskTrackers on nodes with failing
disks repeatedly fail, and there number is high enough to let reduce tasks run
out of attempts before these TaskTrackers get blacklisted.
Wouldn't it be better to only count reduce task failures against number of
attempts when the reduce application is actually running, and not already
during shuffling?
> reduce task failures during shuffling should not count against number of
> retry attempts
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-3462
> URL: https://issues.apache.org/jira/browse/HADOOP-3462
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.3
> Reporter: Christian Kunz
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.