[ 
https://issues.apache.org/jira/browse/MESOS-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661975#comment-16661975
 ] 

Qian Zhang commented on MESOS-9334:
-----------------------------------

After reading some libevent code and our code to call libevent, I think the 
root cause of this issue is, after we call libevent to poll an fd, that fd is 
disabled inside libevent due to a race. Here is the flow:
 # Container1 is launched and cgroups memory subsystem calls the function 
`cgroups::memory::oom::listen()` to listen OOM event for this container, and 
that function will internally open an fd, call libevent to poll it, and return 
a future to cgroups memory subsystem.
 # Container1 exits and when we destroy it, the cleanup method of cgroups 
memory subsystem will discard the future got in #1. As the result, 
`Listener::finalize()` will be called (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/src/linux/cgroups.cpp#L1069:L1087]
 for details), and it will
 ** Discard the future returned by libevent poll which will cause 
`pollDiscard()` called and that will trigger `pollCallback` to be executed 
*asynchronously* (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L66:L70]
 for details).
 ** Close the fd opened in #1 *immediately* which means the fd can be reused 
now.
 # Container2 is launched, and CNI isolator calls `io::read` to read the 
stdout/stderr of CNI plugin for this container. Internally `io::read` *reuses* 
the fd closed in #2 and call libevent to poll it.
 # Now the function `pollCallback` for container1 is executed, and it will 
delete the poll object which will trigger `event_free` to deallocate the event 
for this container (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L50:L52]
 for details). Internally `event_free` will call `event_del` -> 
`event_del_internal` -> `evmap_io_del` -> `evsel->del` to *disable* the fd (see 
[this 
code|https://github.com/libevent/libevent/blob/release-2.0.22-stable/event-internal.h#L78:L79]
 for details), but that fd is now used to read stdout/stderr for container2 in 
#3. Since the fd is disabled inside libevent, the `io::read` we do in #3 will 
never return so the container2 will be stuck at `ISOLATING` state.

> Container stuck at ISOLATING state due to libevent poll never returns
> ---------------------------------------------------------------------
>
>                 Key: MESOS-9334
>                 URL: https://issues.apache.org/jira/browse/MESOS-9334
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Qian Zhang
>            Assignee: Qian Zhang
>            Priority: Critical
>
> We found UCR container may be stuck at `ISOLATING` state:
> {code:java}
> 2018-10-03 09:13:23: I1003 09:13:23.274561 2355 containerizer.cpp:3122] 
> Transitioning the state of container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 
> from PREPARING to ISOLATING
> 2018-10-03 09:13:23: I1003 09:13:23.279223 2354 cni.cpp:962] Bind mounted 
> '/proc/5244/ns/net' to 
> '/run/mesos/isolators/network/cni/1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54/ns' 
> for container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54
> 2018-10-03 09:23:22: I1003 09:23:22.879868 2354 containerizer.cpp:2459] 
> Destroying container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 in ISOLATING state
> {code}
>  In the above logs, the state of container 
> `1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54` was transitioned to `ISOLATING` at 
> 09:13:23, but did not transitioned to any other states until it was destroyed 
> due to the executor registration timeout (10 mins). And the destroy can never 
> complete since it needs to wait for the container to finish isolating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to