[
https://issues.apache.org/jira/browse/MESOS-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139261#comment-16139261
]
Benjamin Mahler commented on MESOS-7865:
----------------------------------------
{noformat}
commit 0b9c3dedb04e9bf2c3d1f1663cf9cd4f47cb674b
Author: Benjamin Mahler <[email protected]>
Date: Thu Aug 10 18:34:15 2017 -0700
Fixed a bug where the agent kills and still launches a task.
The following race leads to the agent both killing and launching a task:
(1) Slave::__run completes, task is now within Executor::queuedTasks.
(2) Slave::killTask locates the executor based on the task ID residing
in queuedTasks, calls Slave::statusUpdate() with TASK_KILLED.
(3) Slave::___run assumes that killed tasks have been removed from
Executor::queuedTasks, but this now occurs asynchronously in
Slave::_statusUpdate. So, the agent still sees the queued task
and delivers it and adds the task to Executor::launchedTasks.
(3) Slave::_statusUpdate runs, removes the task from
Executor::launchedTasks and adds it to Executor::terminatedTasks.
The fix applied here is to synchronously transition queued tasks to
a terminal state when statusUpdate is called. This can be done because
for queued tasks, we do not need to retrieve the container status (the
task never reached the container).
Review: https://reviews.apache.org/r/61639
{noformat}
> Agent may process a kill task and still launch the task.
> --------------------------------------------------------
>
> Key: MESOS-7865
> URL: https://issues.apache.org/jira/browse/MESOS-7865
> Project: Mesos
> Issue Type: Bug
> Components: agent
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
> Priority: Critical
> Fix For: 1.5.0
>
>
> Based on the investigation of MESOS-7744, the agent has a race in which
> "queued" tasks can still be launched after the agent has processed a kill
> task for them. This race was introduced when {{Slave::statusUpdate}} was made
> asynchronous:
> (1) {{Slave::__run}} completes, task is now within {{Executor::queuedTasks}}
> (2) {{Slave::killTask}} locates the executor based on the task ID residing in
> queuedTasks, calls {{Slave::statusUpdate()}} with {{TASK_KILLED}}
> (3) {{Slave::___run}} assumes that killed tasks have been removed from
> {{Executor::queuedTasks}}, but this now occurs asynchronously in
> {{Slave::_statusUpdate}}. So, the executor still sees the queued task and
> delivers it and adds the task to {{Executor::launchedTasks}}.
> (3) {{Slave::_statusUpdate}} runs, removes the task from
> {{Executor::launchedTasks}} and adds it to {{Executor::terminatedTasks}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)