Lost tasktracker leads to hung jobs
-----------------------------------

                 Key: HADOOP-1060
                 URL: https://issues.apache.org/jira/browse/HADOOP-1060
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.0
            Reporter: Arun C Murthy
            Priority: Critical
             Fix For: 0.12.1


When the JobTracker detects that a TaskTracker is 'lost' and tries to fail the 
incomplete tasks and the completed map tasks it fails with:
2007-03-03 00:38:24,056 ERROR org.apache.hadoop.mapred.JobTracker: Tracker 
Expiry Thread got exception: java.lang.IndexOutOfBoundsException: Index: 310, 
Size: 307
        at java.util.ArrayList.RangeCheck(ArrayList.java:546)
        at java.util.ArrayList.get(ArrayList.java:321)
        at 
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:342)
        at 
org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:862)
        at 
org.apache.hadoop.mapred.JobTracker.lostTaskTracker(JobTracker.java:1637)
        at 
org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:269)
        at java.lang.Thread.run(Thread.java:595)

This means that the tasks aren't 'failed' correctly and the JT just assumes the 
task is running and never restarts the task... thereby leading to a hung job.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to