Jonathan Eagles created TEZ-3950:
------------------------------------
Summary: Preempted task attempts intermittently marked as FAILED
instead of KILLED
Key: TEZ-3950
URL: https://issues.apache.org/jira/browse/TEZ-3950
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.9.2, 0.10.0
Reporter: Jonathan Eagles
Attachments: TEZ-3950.fail.patch
TestMockDAGAppMaster.testInternalPreemption intermittently fails with
expected:<KILLED> but was:<FAILED>
Crux of the matter is TaskSchedulerManager sends two events
-
TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
AMContainerStopRequest -> TA_CONTAINER_TERMINATING
- AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM
In order to kill a task attempt correctly the second message loop must complete
first. The first path is longer so the second message loop completes almost
always first. When the first message loop completes first, then the task
attempt is marked as FAILED and not KILLED.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)