[ 
https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644151#action_12644151
 ] 

Owen O'Malley commented on HADOOP-4305:
---------------------------------------

One thing that concerns me, is that there needs to be a policy for degrading 
the information across time. How about a policy where we define the max number 
of black lists (from successful jobs) you can be on and still get new tasks. 
Furthermore, once each day the counters are decremented by 1. (To avoid a 
massive re-enablement, I'd probably use hostname.hashcode() % 24 as the hour to 
decrement the count for each host. So if you set the threshold to 4, it would 
give you:
  1. Any TT black listed by 4 (or more) jobs would not get new tasks.
  2. Each day, the JT would forgive one blacklist and likely re-enable the TT
  3. A re-enabled TT would only get one chance for that day.

Thoughts?

> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When running a batch of jobs it often happens that the same tasktrackers are 
> blacklisted again and again. This can slow job execution considerably, in 
> particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to 
> declare them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to