Lost tasktracker leads to hung jobs ----------------------------------- Key: HADOOP-1060 URL: https://issues.apache.org/jira/browse/HADOOP-1060 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.0 Reporter: Arun C Murthy Priority: Critical Fix For: 0.12.1
When the JobTracker detects that a TaskTracker is 'lost' and tries to fail the incomplete tasks and the completed map tasks it fails with: 2007-03-03 00:38:24,056 ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got exception: java.lang.IndexOutOfBoundsException: Index: 310, Size: 307 at java.util.ArrayList.RangeCheck(ArrayList.java:546) at java.util.ArrayList.get(ArrayList.java:321) at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:342) at org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:862) at org.apache.hadoop.mapred.JobTracker.lostTaskTracker(JobTracker.java:1637) at org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:269) at java.lang.Thread.run(Thread.java:595) This means that the tasks aren't 'failed' correctly and the JT just assumes the task is running and never restarts the task... thereby leading to a hung job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.