[jira] [Updated] (MAPREDUCE-4982) AM hung with one pending map task

Jason Lowe (JIRA) Thu, 07 Feb 2013 14:55:14 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated MAPREDUCE-4982:
----------------------------------

    Affects Version/s:     (was: 2.0.3-alpha)

I see some convincing evidence in the AM log that what I suspected is true.  
There was one less "Assigned from earlierFailedMaps" entry in the log than 
there were failed map attempts that received containers.  I see one of them was 
allocated a normal priority container, although I'm not sure how from looking 
at the code.

Originally I thought trunk and 2.0.3-alpha would have the same issue, but I 
think MAPREDUCE-4893 inadvertently fixes this scenario.  It changed the logic 
so it tries to assign containers without locality (i.e.: fast fail maps and 
reducer priority containers) then falls through to assigning them to normal 
maps if it still hasn't found an assignment.  Before that change it would throw 
away a fast fail container if no fast fail map was around to take it.  There's 
an assert in the code indicating only normal priority map containers are 
expected, but according to what I've seen it does appear that fast fail maps 
can somehow steal a normal priority container on occasion, leaving a subsequent 
fast-fail request to be assigned to the normal map attempt that was stolen from.
                
> AM hung with one pending map task
> ---------------------------------
>
>                 Key: MAPREDUCE-4982
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4982
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 0.23.6
>            Reporter: Jason Lowe
>
> Saw a job that hung with one pending map task that never ran.  The task was 
> in the SCHEDULED state with a single attempt that was in the UNASSIGNED 
> state.  The AM looked like it was waiting for a container from the RM, but 
> the RM was never granting it the one container it needed.
> I suspect the AM botched the container request bookkeeping somehow.  More 
> details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4982) AM hung with one pending map task

Reply via email to