job failing because of reassigning same tasktracker to failing tasks
--------------------------------------------------------------------
Key: HADOOP-3333
URL: https://issues.apache.org/jira/browse/HADOOP-3333
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.16.3
Reporter: Christian Kunz
Priority: Blocker
We are long running a job in a 2nd atttempt. Previous job was failing and
current jobs risks to fail as well, because reduce tasks failing on marginal
TaskTrackers are assigned repeatedly to the same TaskTrackers (probably because
it is the only available slot), eventually running out of attempts.
Reduce tasks should be assigned to the same TaskTrackers at most twice, or
TaskTrackers need to get some better smarts to find failing hardware.
BTW, mapred.reduce.max.attempts=12, which is high, but does not help in this
case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.