Jobs are getting  stuck  on JobTracker  restart
-----------------------------------------------

                 Key: HADOOP-5325
                 URL: https://issues.apache.org/jira/browse/HADOOP-5325
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.20.0
            Reporter: Karam Singh
            Priority: Blocker


Recovery manager failed to recover the job properly and threw "
2009-02-25 09:24:09,944 INFO org.apache.hadoop.fs.FSInputChecker: Found 
checksum error: b[7168, 7168]=
org.apache.hadoop.fs.ChecksumException: Checksum error: file filename ".
. As part of recovery, one of the attempts got added to the expiry launching 
tasks list. But. another attempt of the same tip was relaunched with the same 
attempt id and the job was stuck. Looks like the attempt which got expired was 
marked as failed and newly launched attempt (with same attempt id) was 
successful and Jobtracker tried marking a failed tip as successful hence the 
tip was considered as failed but tip.isComplete returns true.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to