[
https://issues.apache.org/jira/browse/MAPREDUCE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allen Wittenauer resolved MAPREDUCE-481.
----------------------------------------
Resolution: Fixed
Closing this out as stale.
Blacklisting had a lot more work done to it in 1.x plus this isn't relevant for
2.x anymore.
> Improvements to Global Black-listing of TaskTrackers
> ----------------------------------------------------
>
> Key: MAPREDUCE-481
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-481
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Arun C Murthy
>
> HADOOP-4305 added a global black-list of tasktrackers.
> We saw a scenario on one of our clusters where a few jobs caused a lot of
> tasktrackers to immediately be blacklisted. This was caused by a specific set
> of jobs which (same user) whose tasks were shot down the by the TaskTracker
> for being over the vmem limit of 2G. Each of these jobs had over 600 failures
> of the same kind. This resulted in each of the users black-listing some
> tasktrackers, which in itself is wrong since the failures had nothing to do
> with the node on which the failure occurred (i.e. high memory usage) and
> shouldn't have had to penalized the tasktracker. We clearly need to start
> treating system and user failures separately for black-listing etc. A
> DiskError is fatal and should probably we blacklisted immediately while a
> task which was 'failed' for using more memory shouldn't count against the
> tasktracker at all!
> The other problem is that we never configured mapred.max.tracker.blacklists
> and continue to use the default value of 4. Further more this config should
> really be a percent of the cluster-size and not a whole number.
--
This message was sent by Atlassian JIRA
(v6.2#6252)