Lost tasktracker leads to hung jobs
-----------------------------------
Key: HADOOP-1060
URL: https://issues.apache.org/jira/browse/HADOOP-1060
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.0
Reporter: Arun C Murthy
Priority: Critical
Fix For: 0.12.1
When the JobTracker detects that a TaskTracker is 'lost' and tries to fail the
incomplete tasks and the completed map tasks it fails with:
2007-03-03 00:38:24,056 ERROR org.apache.hadoop.mapred.JobTracker: Tracker
Expiry Thread got exception: java.lang.IndexOutOfBoundsException: Index: 310,
Size: 307
at java.util.ArrayList.RangeCheck(ArrayList.java:546)
at java.util.ArrayList.get(ArrayList.java:321)
at
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:342)
at
org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:862)
at
org.apache.hadoop.mapred.JobTracker.lostTaskTracker(JobTracker.java:1637)
at
org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:269)
at java.lang.Thread.run(Thread.java:595)
This means that the tasks aren't 'failed' correctly and the JT just assumes the
task is running and never restarts the task... thereby leading to a hung job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.