Andrei Budnik created MESOS-9887:
------------------------------------

             Summary: Race condition between two terminal task status updates 
for Docker executor.
                 Key: MESOS-9887
                 URL: https://issues.apache.org/jira/browse/MESOS-9887
             Project: Mesos
          Issue Type: Bug
          Components: agent, containerization
            Reporter: Andrei Budnik
         Attachments: race_example.txt

h2. Overview

Expected behavior:
 Task successfully finishes and sends TASK_FINISHED status update.

Observed behavior:
 Task successfully finishes, but the agent sends TASK_FAILED with the reason 
"REASON_EXECUTOR_TERMINATED".

In normal circumstances, Docker executor 
[sends|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/docker/executor.cpp#L758]
 final status update TASK_FINISHED to the agent, which then [gets 
processed|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5543]
 by the agent before termination of the executor's process.

However, if the processing of the initial TASK_FINISHED gets delayed, then 
there is a chance that Docker executor terminates and the agent 
[triggers|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L6662]
 TASK_FAILED which will [be 
handled|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5816-L5826]
 prior to the TASK_FINISHED status update.

See attached logs which contain an example of the race condition.
h2. Reproducing bug

1. Add the following code:
{code:java}
  static int c = 0;
  if (++c == 3) { // to skip TASK_STARTING and TASK_RUNNING status updates.
        ::sleep(2);
  }
{code}
to the 
[`ComposingContainerizerProcess::status`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L578]
 and to the 
[`DockerContainerizerProcess::status`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/docker.cpp#L2167].

2. Recompile mesos

3. Launch mesos master and agent locally

4. Launch a simple Docker task via `mesos-execute`:
{code}
#  cd build
./src/mesos-execute --master="`hostname`:5050" --name="a" 
--containerizer=docker --docker_image=alpine --resources="cpus:1;mem:32" 
--command="ls"
{code}
h2. Race condition - description

1. Mesos agent receives TASK_FINISHED status update and then subscribes on 
[`containerizer->status()`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5754-L5761].

2. `containerizer->status()` operation for TASK_FINISHED status update gets 
delayed in the composing containerizer (e.g. due to switch of the worker thread 
that executes `status` method).

3. Docker executor terminates and the agent 
[triggers|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L6662]
 TASK_FAILED.

4. Docker containerizer destroys the container. A registered callback for the 
`containerizer->wait` call in the composing containerizer dispatches [lambda 
function|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L368-L373]
 that will clean up `containers_` map.

5. Composing c'zer resumes and dispatches 
`[status()|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L579]`
 method to the Docker containerizer for TASK_FINISHED, which in turn hangs for 
a few seconds.

6. Corresponding `containerId` gets removed from the `containers_` map of the 
composing c'zer.

7. Mesos agent subscribes on 
[`containerizer->status()`|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5754-L5761]
 for the TASK_FAILED status update.

8. Composing c'zer returns ["Container not 
found"|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/containerizer/composing.cpp#L576]
 for TASK_FAILED.

9. 
`[Slave::_statusUpdate|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5826]`
 stores TASK_FAILED terminal status update in the executor's data structure.

10. Docker executor resumes and finishes processing of `status()` method for 
TASK_FINISHED. Finally, it returns control to the `Slave::_statusUpdate` 
continuation. This method 
[discovers|https://github.com/apache/mesos/blob/0026ea46dc35cbba1f442b8e425c6cbaf81ee8f8/src/slave/slave.cpp#L5808-L5814]
 that the executor has already been destroyed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to