[
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205824#comment-13205824
]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3846:
----------------------------------------------------
Sharad, I think MAPREDUCE-3802 is different even though the exception trace is
the same.
What is happening here is with the second AM generation itself. For the erring
task, there are multiple attempts. One of the attempts doesn't get logged to
JobHistory because the TaskAttempt fails before launch itself. Today we log
TaskAttempts and set start times only after the real JVM launch (Do you know
why? May be we can change this?). Because of this, JobHistory knows about, say
attempts 0,1 and 3. When we replay the completed tasks, the attempt numbers
take 0,1,2 and so we get the NPE.
> Restarted+Recovered AM hangs in some corner cases
> -------------------------------------------------
>
> Key: MAPREDUCE-3846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Critical
> Attachments: MAPREDUCE-3846-20120210.txt
>
>
> [~karams] found this while testing AM restart/recovery feature. After the
> first generation AM crashes (manually killed by kill -9), the second
> generation AM starts, but hangs after a while.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira