[
https://issues.apache.org/jira/browse/MESOS-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jie Yu updated MESOS-5380:
--------------------------
Labels: mesosphere (was: )
> Killing a queued task can cause the corresponding command executor to never
> terminate.
> --------------------------------------------------------------------------------------
>
> Key: MESOS-5380
> URL: https://issues.apache.org/jira/browse/MESOS-5380
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Affects Versions: 0.28.0, 0.28.1
> Reporter: Jie Yu
> Assignee: Vinod Kone
> Priority: Blocker
> Labels: mesosphere
> Fix For: 0.29.0, 0.28.2
>
>
> We observed this in our testing environment. Sequence of events:
> 1) A command task is queued since the executor has not registered yet.
> 2) The framework issues a killTask.
> 3) Since executor is in REGISTERING state, agent calls
> `statusUpdate(TASK_KILLED, UPID())`
> 4) `statusUpdate` now will call `containerizer->status()` before calling
> `executor->terminateTask(status.task_id(), status);` which will remove the
> queued task. (Introduced in this patch: https://reviews.apache.org/r/43258).
> 5) Since the above is async, it's possible that the task is still in queued
> task when we trying to see if we need to kill unregistered executor in
> `killTask`:
> {code}
> // TODO(jieyu): Here, we kill the executor if it no longer has
> // any task to run and has not yet registered. This is a
> // workaround for those single task executors that do not have a
> // proper self terminating logic when they haven't received the
> // task within a timeout.
> if (executor->queuedTasks.empty()) {
> CHECK(executor->launchedTasks.empty())
> << " Unregistered executor '" << executor->id
> << "' has launched tasks";
> LOG(WARNING) << "Killing the unregistered executor " << *executor
> << " because it has no tasks";
> executor->state = Executor::TERMINATING;
> containerizer->destroy(executor->containerId);
> }
> {code}
> 6) Consequently, the executor will never be terminated by Mesos.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)