[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136445#comment-13136445
 ] 

Robert Joseph Evans commented on MAPREDUCE-3274:
------------------------------------------------

OK so it is a race condition.  

{noformat}
attempt_1319242394842_0065_m_000000_0 is Launched (STATE RUNNING)
Many other reducers are launched filling up the queues capacity
attempt_1319242394842_0065_r_000008_0 is in the UNASSIGNED state waiting to be 
scheduled
attempt_1319242394842_0065_m_000000_0 is killed for going over its memory limit
attempt_1319242394842_0065_m_000000_0 is cleaned up and a replacement 
attempt_1319242394842_0065_m_000000_1 is added to be scheduled
attempt_1319242394842_0065_r_000008_0 gets a container and goes to the ASSIGNED 
state.
Preemption is triggered. attempt_1319242394842_0065_r_000008_0 is selected and 
is sent a TA_KILL event
(the History Log ignores the event because it has not written out a START event 
for attempt_1319242394842_0065_r_000008_0 yet)
attempt_1319242394842_0065_r_000008_0 transitions to KILLED, going through 
several other states
attempt_1319242394842_0065_r_000008_1 is created to replace 
attempt_1319242394842_0065_r_000008_0 and moves to UNASSIGNED state
Processing attempt_1319242394842_0065_r_000008_0 of type TA_CONTAINER_LAUNCHED 
(The container for the killed task is now launched)
JVM with ID : jvm_1319242394842_0065_r_000008 asked for a task
JVM with ID: jvm_1319242394842_0065_r_000008 given task: 
attempt_1319242394842_0065_r_000004_0
{noformat}

So even though attempt_1319242394842_0065_r_000008_0 was killed, its container 
when it finally showed up was given to a different reduce attempt, and did not 
end up freeing up any resources at all.


                
> Race condition in MR App Master Preemtion can cause a dead lock
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-3274
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3274
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, scheduler
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.0, 0.24.0
>
>
> There appears to be a race condition in the MR App Master in relation to 
> preempting reducers to let a mapper run.  In the particular case that I have 
> been debugging a reducer was selected for preemption that did not have a 
> container assigned to it yet. When the container became available that reduce 
> started running and the previous TA_KILL event appears to have been ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to