[ 
https://issues.apache.org/jira/browse/MESOS-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7926:
--------------------------
    Description: 
This is the sequence of events:
1) Default executor launches a nested container
2) Default executor invokes agent API WAIT_NESTED_CONTAINER
3) Default executor is killed
4) The connection to the agent for WAIT_NESTED_CONTAINER breaks
5) libprocess discard the future, which propagates to the code 
[here|https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L1955-L1956].
6) `termination.future()` has the discard flag being set to true (i.e., 
hasDiscard() == true).
7) Default executor termination triggers container destroy for the nested 
container
8) When the destroy of the nested container is done, the control will reach 
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L2176-L2177).
9) In the 
[thenf](https://github.com/apache/mesos/blob/1.4.0-rc3/3rdparty/libprocess/include/process/future.hpp#L1299-L1301)
 handler for 'termination.future()', since `termination.future()` has discard 
flag set (hasDiscard() == true), we'll call promise->discard(), which cause the 
returned future to be in DISCARDED state.
10) The top level container destroy will fail because nested container destroy 
failed
11) This cause all isolator cleanup for the top level container not being 
called (e.g., CNI detach).

  was:
This is the sequence of events:
1) Default executor launches a nested container
2) Default executor invokes agent API WAIT_NESTED_CONTAINER
3) Default executor is killed
4) The connection to the agent for WAIT_NESTED_CONTAINER breaks
5) libprocess discard the future, which propagates to the code 
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L1955-L1956).
6) `termination.future()` has the discard flag being set to true (i.e., 
hasDiscard() == true).
7) Default executor termination triggers container destroy for the nested 
container
8) When the destroy of the nested container is done, the control will reach 
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L2176-L2177).
9) In the 
[thenf](https://github.com/apache/mesos/blob/1.4.0-rc3/3rdparty/libprocess/include/process/future.hpp#L1299-L1301)
 handler for 'termination.future()', since `termination.future()` has discard 
flag set (hasDiscard() == true), we'll call promise->discard(), which cause the 
returned future to be in DISCARDED state.
10) The top level container destroy will fail because nested container destroy 
failed
11) This cause all isolator cleanup for the top level container not being 
called (e.g., CNI detach).


> Abnormal termination of default executor can cause 
> MesosContainerizer::destroy to fail. 
> ----------------------------------------------------------------------------------------
>
>                 Key: MESOS-7926
>                 URL: https://issues.apache.org/jira/browse/MESOS-7926
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.2, 1.3.1, 1.4.1
>            Reporter: Jie Yu
>            Priority: Critical
>
> This is the sequence of events:
> 1) Default executor launches a nested container
> 2) Default executor invokes agent API WAIT_NESTED_CONTAINER
> 3) Default executor is killed
> 4) The connection to the agent for WAIT_NESTED_CONTAINER breaks
> 5) libprocess discard the future, which propagates to the code 
> [here|https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L1955-L1956].
> 6) `termination.future()` has the discard flag being set to true (i.e., 
> hasDiscard() == true).
> 7) Default executor termination triggers container destroy for the nested 
> container
> 8) When the destroy of the nested container is done, the control will reach 
> [here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L2176-L2177).
> 9) In the 
> [thenf](https://github.com/apache/mesos/blob/1.4.0-rc3/3rdparty/libprocess/include/process/future.hpp#L1299-L1301)
>  handler for 'termination.future()', since `termination.future()` has discard 
> flag set (hasDiscard() == true), we'll call promise->discard(), which cause 
> the returned future to be in DISCARDED state.
> 10) The top level container destroy will fail because nested container 
> destroy failed
> 11) This cause all isolator cleanup for the top level container not being 
> called (e.g., CNI detach).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to