Jie Yu created MESOS-7926:
-----------------------------
Summary: Abnormal termination of default executor can cause
MesosContainerizer::destroy to fail.
Key: MESOS-7926
URL: https://issues.apache.org/jira/browse/MESOS-7926
Project: Mesos
Issue Type: Bug
Affects Versions: 1.3.1, 1.2.2, 1.4.1
Reporter: Jie Yu
Priority: Critical
This is the sequence of events:
1) Default executor launches a nested container
2) Default executor invokes agent API WAIT_NESTED_CONTAINER
3) Default executor is killed
4) The connection to the agent for WAIT_NESTED_CONTAINER breaks
5) libprocess discard the future, which propagates to the code
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L1955-L1956).
6) `termination.future()` has the discard flag being set to true (i.e.,
hasDiscard() == true).
7) Default executor termination triggers container destroy for the nested
container
8) When the destroy of the nested container is done, the control will reach
[here](https://github.com/apache/mesos/blob/1.4.0-rc3/src/slave/containerizer/mesos/containerizer.cpp#L2176-L2177).
9) In the
[thenf](https://github.com/apache/mesos/blob/1.4.0-rc3/3rdparty/libprocess/include/process/future.hpp#L1299-L1301)
handler for 'termination.future()', since `termination.future()` has discard
flag set (hasDiscard() == true), we'll call promise->discard(), which cause the
returned future to be in DISCARDED state.
10) The top level container destroy will fail because nested container destroy
failed
11) This cause all isolator cleanup for the top level container not being
called (e.g., CNI detach).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)