[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205824#comment-13205824
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3846:
----------------------------------------------------

Sharad, I think MAPREDUCE-3802 is different even though the exception trace is 
the same.

What is happening here is with the second AM generation itself. For the erring 
task, there are multiple attempts. One of the attempts doesn't get logged to 
JobHistory because the TaskAttempt fails before launch itself. Today we log 
TaskAttempts and set start times only after the real JVM launch (Do you know 
why? May be we can change this?). Because of this,  JobHistory knows about, say 
attempts 0,1 and 3. When we replay the completed tasks, the attempt numbers 
take 0,1,2 and so we get the NPE.
                
> Restarted+Recovered AM hangs in some corner cases
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3846
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>         Attachments: MAPREDUCE-3846-20120210.txt
>
>
> [~karams] found this while testing AM restart/recovery feature. After the 
> first generation AM crashes (manually killed by kill -9), the second 
> generation AM starts, but hangs after a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to