[ 
https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649594#action_12649594
 ] 

Devaraj Das commented on HADOOP-4305:
-------------------------------------

1. Change ConvertTrackerNameToHostName not to include "tracker_" and remove the 
extra List introduced in JobInProgress.
2. Faults in PotentiallyFaultyList should also get decremented if there are no 
faults on that tracker for 24 hours.
3. Make AVERAGE_BLACKLST_THRESHOLD configurable. but do not expose it outside 
(this will give us a way to tune until we reach the correct number).
4. Put more comments in code explaining the algorithm.
5. change the name of addFaultyTracker to incrementFaults
6. ReinitAction on lostTaskTracker should not erase its faults.


> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>         Attachments: patch-4305-0.18.txt, patch-4305-1.txt
>
>
> When running a batch of jobs it often happens that the same tasktrackers are 
> blacklisted again and again. This can slow job execution considerably, in 
> particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to 
> declare them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to