[ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652368#action_12652368 ]
Devaraj Das commented on HADOOP-4305: ------------------------------------- Some comments: 1. Format if condition brackets properly in incrementFaults method 2. You should be able to use the same datastructure for both potentiallyFaulty and blacklisted trackers. 3. Add a comment for mapred.cluster.average.blacklist.threshold that it is there solely for tuning purposes and once this feature has been tested in real clusters and an appropriate value for the threshold has been found, this config might be taken out. 4. Check whether you can remove initialContact flag and use only the restarted flag in the heartbeat method. This is a more serious change but might be worthwhile in simplifying the state machine. > repeatedly blacklisted tasktrackers should get declared dead > ------------------------------------------------------------ > > Key: HADOOP-4305 > URL: https://issues.apache.org/jira/browse/HADOOP-4305 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Christian Kunz > Assignee: Amareshwari Sriramadasu > Fix For: 0.20.0 > > Attachments: patch-4305-0.18.txt, patch-4305-1.txt, patch-4305-2.txt > > > When running a batch of jobs it often happens that the same tasktrackers are > blacklisted again and again. This can slow job execution considerably, in > particular, when tasks fail because of timeout. > It would make sense to no longer assign any tasks to such tasktrackers and to > declare them dead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.