Jobs are getting stuck on JobTracker restart
-----------------------------------------------
Key: HADOOP-5325
URL: https://issues.apache.org/jira/browse/HADOOP-5325
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.20.0
Reporter: Karam Singh
Priority: Blocker
Recovery manager failed to recover the job properly and threw "
2009-02-25 09:24:09,944 INFO org.apache.hadoop.fs.FSInputChecker: Found
checksum error: b[7168, 7168]=
org.apache.hadoop.fs.ChecksumException: Checksum error: file filename ".
. As part of recovery, one of the attempts got added to the expiry launching
tasks list. But. another attempt of the same tip was relaunched with the same
attempt id and the job was stuck. Looks like the attempt which got expired was
marked as failed and newly launched attempt (with same attempt id) was
successful and Jobtracker tried marking a failed tip as successful hence the
tip was considered as failed but tip.isComplete returns true.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.