[ 
https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641714#action_12641714
 ] 

Vinod K V commented on HADOOP-4305:
-----------------------------------

Can we count this only from successful jobs?
bq. Even that wouldn't be enough. Blacklisting of a TT by a successful job 
would only mean that this TT is not suitable for running this job. We can't 
generalize it to say that this TT is not fit for running any job. The later can 
be concluded only by monitoring TT health, which should be done independently 
of job failures.

The proposal here doesn't seem to be a right fix. If we are concerned about 
batch jobs(similar jobs), and of same jobs being repetitively submitted, we can 
addressing the issue by introducing the concept of a batch and by linking batch 
jobs by something like a 'batch-id'. By default all jobs would belong to the 
default batch. And then, we can consider this batch-id for blacklisting TTs. 
Thoughts?

> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When running a batch of jobs it often happens that the same tasktrackers are 
> blacklisted again and again. This can slow job execution considerably, in 
> particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to 
> declare them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to