[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4992:
----------------------------------

    Status: Open  (was: Patch Available)

This approach looks OK for the short-term.  I'm not thrilled about the idea of 
explicitly losing information on task attempts that happened to not complete, 
as it will be odd in the history of the recovered AM to see a map task with a 
single attempt that ends in _1 or _2 instead of _0.  If this goes in we should 
file a follow-up JIRA to fix recovery so attempts that were "in-flight" when 
the AM crashed are at least documented in some way on the subsequent AM (e.g.: 
we mark them as KILLED or something, but at least the user can see what nodes 
they ran on and what time they were launched).

There is one thing I'd like to see fixed in the patch.  When we're iterating 
the taskAttempts in the {{taskInfo}} and filtering out attempts that didn't 
complete, we should walk and remove entries using an iterator rather than 
reaching around and calling {{remove}} on the map.
                
> AM hung in RecoveryService
> --------------------------
>
>                 Key: MAPREDUCE-4992
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4992
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.2-alpha, trunk, 0.23.6
>            Reporter: Robert Parker
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE-4992v1.patch
>
>
> A job hung in the Recovery Service on an AM restart. There were four map 
> tasks events that were not processed and that prevented the complete task 
> count from reaching zero which exits the recovery service. All four tasks 
> were speculative

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to