[
https://issues.apache.org/jira/browse/MAPREDUCE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe reopened MAPREDUCE-4992:
-----------------------------------
This is still occurring in a number of ways:
* If the task attempt that succeeded was attempt 1 but there is no completion
event in the history file for attempt 0, it recovers only attempt 0 but is
waiting for attempt 1 to complete.
* If two task attempts succeed simultaneously it only recovers attempt 0 but is
waiting for attempt 1 to complete.
* If the prior AM attempt was backed up in event processing and launched
speculative task attempts *after* a task attempt completed then it ends up
waiting on them but they were never launched.
> AM hangs in RecoveryService when recovering tasks with speculative attempts
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-4992
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4992
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: trunk, 2.0.2-alpha, 0.23.6
> Reporter: Robert Parker
> Assignee: Robert Parker
> Priority: Critical
> Fix For: 0.23.7, 2.0.5-beta
>
> Attachments: MAPREDUCE-4992v1.patch, MAPREDUCE-4992v2.patch
>
>
> A job hung in the Recovery Service on an AM restart. There were four map
> tasks events that were not processed and that prevented the complete task
> count from reaching zero which exits the recovery service. All four tasks
> were speculative
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira