[ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641714#action_12641714 ]
Vinod K V commented on HADOOP-4305: ----------------------------------- Can we count this only from successful jobs? bq. Even that wouldn't be enough. Blacklisting of a TT by a successful job would only mean that this TT is not suitable for running this job. We can't generalize it to say that this TT is not fit for running any job. The later can be concluded only by monitoring TT health, which should be done independently of job failures. The proposal here doesn't seem to be a right fix. If we are concerned about batch jobs(similar jobs), and of same jobs being repetitively submitted, we can addressing the issue by introducing the concept of a batch and by linking batch jobs by something like a 'batch-id'. By default all jobs would belong to the default batch. And then, we can consider this batch-id for blacklisting TTs. Thoughts? > repeatedly blacklisted tasktrackers should get declared dead > ------------------------------------------------------------ > > Key: HADOOP-4305 > URL: https://issues.apache.org/jira/browse/HADOOP-4305 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Christian Kunz > Assignee: Amareshwari Sriramadasu > Fix For: 0.20.0 > > > When running a batch of jobs it often happens that the same tasktrackers are > blacklisted again and again. This can slow job execution considerably, in > particular, when tasks fail because of timeout. > It would make sense to no longer assign any tasks to such tasktrackers and to > declare them dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.