[
https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli updated MAPREDUCE-3846:
-----------------------------------------------
Attachment: MAPREDUCE-3846-20120210.txt
If we log all TaskAttempts (even before launch), we may perhaps avoid this, but
I am not sure. So for now, I changed the attemptsNumbers generation during
recovery to first use the numbers from previous generation and then jump after
all those numbers are exhausted.
I also made sure that attempts are replayed correctly in the order of original
start times, otherwise (as my test revealed), we may be replaying in wrong
order with wrong times.
The test fails without the patch and passes with.
Sharad, can you please look at the patch and see if it makes sense? Thanks in
advance!
> Restarted+Recovered AM hangs in some corner cases
> -------------------------------------------------
>
> Key: MAPREDUCE-3846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Critical
> Attachments: MAPREDUCE-3846-20120210.txt
>
>
> [~karams] found this while testing AM restart/recovery feature. After the
> first generation AM crashes (manually killed by kill -9), the second
> generation AM starts, but hangs after a while.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira