Jonathan Eagles created TEZ-3950:
------------------------------------

             Summary: Preempted task attempts intermittently marked as FAILED 
instead of KILLED
                 Key: TEZ-3950
                 URL: https://issues.apache.org/jira/browse/TEZ-3950
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.9.2, 0.10.0
            Reporter: Jonathan Eagles
         Attachments: TEZ-3950.fail.patch

TestMockDAGAppMaster.testInternalPreemption intermittently fails with 
expected:<KILLED> but was:<FAILED>


Crux of the matter is TaskSchedulerManager sends two events

- 
TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
 AMContainerStopRequest -> TA_CONTAINER_TERMINATING
- AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM

In order to kill a task attempt correctly the second message loop must complete 
first. The first path is longer so the second message loop completes almost 
always first. When the first message loop completes first, then the task 
attempt is marked as FAILED and not KILLED.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to