[
https://issues.apache.org/jira/browse/MAPREDUCE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated MAPREDUCE-4982:
----------------------------------
Affects Version/s: (was: 2.0.3-alpha)
I see some convincing evidence in the AM log that what I suspected is true.
There was one less "Assigned from earlierFailedMaps" entry in the log than
there were failed map attempts that received containers. I see one of them was
allocated a normal priority container, although I'm not sure how from looking
at the code.
Originally I thought trunk and 2.0.3-alpha would have the same issue, but I
think MAPREDUCE-4893 inadvertently fixes this scenario. It changed the logic
so it tries to assign containers without locality (i.e.: fast fail maps and
reducer priority containers) then falls through to assigning them to normal
maps if it still hasn't found an assignment. Before that change it would throw
away a fast fail container if no fast fail map was around to take it. There's
an assert in the code indicating only normal priority map containers are
expected, but according to what I've seen it does appear that fast fail maps
can somehow steal a normal priority container on occasion, leaving a subsequent
fast-fail request to be assigned to the normal map attempt that was stolen from.
> AM hung with one pending map task
> ---------------------------------
>
> Key: MAPREDUCE-4982
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4982
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 0.23.6
> Reporter: Jason Lowe
>
> Saw a job that hung with one pending map task that never ran. The task was
> in the SCHEDULED state with a single attempt that was in the UNASSIGNED
> state. The AM looked like it was waiting for a container from the RM, but
> the RM was never granting it the one container it needed.
> I suspect the AM botched the container request bookkeeping somehow. More
> details to follow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira