Jie Yu created MESOS-7926:
-----------------------------

             Summary: Abnormal termination of default executor can cause 
MesosContainerizer::destroy to fail. 
                 Key: MESOS-7926
                 URL: https://issues.apache.org/jira/browse/MESOS-7926
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 1.3.1, 1.2.2, 1.4.1
            Reporter: Jie Yu
            Priority: Critical


This is the sequence of events:
1) Default executor launches a nested container
2) Default executor invokes agent API WAIT_NESTED_CONTAINER
3) Default executor is killed
4) The connection to the agent for WAIT_NESTED_CONTAINER breaks
5) libprocess discard the future, which propagates to the code 
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L1955-L1956).
6) `termination.future()` has the discard flag being set to true (i.e., 
hasDiscard() == true).
7) Default executor termination triggers container destroy for the nested 
container
8) When the destroy of the nested container is done, the control will reach 
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L2176-L2177).
9) In the 
[thenf](https://github.com/apache/mesos/blob/1.4.0-rc3/3rdparty/libprocess/include/process/future.hpp#L1299-L1301)
 handler for 'termination.future()', since `termination.future()` has discard 
flag set (hasDiscard() == true), we'll call promise->discard(), which cause the 
returned future to be in DISCARDED state.
10) The top level container destroy will fail because nested container destroy 
failed
11) This cause all isolator cleanup for the top level container not being 
called (e.g., CNI detach).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to