[ 
https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353568#comment-16353568
 ] 

Qian Zhang commented on MESOS-8488:
-----------------------------------

[~greggomann], I had a discussion with [~vinodkone], he suggests we can still 
go with the wait pid solution. Although the container process is not the child 
process of Docker executor, we can still call `process::reap()` in Docker 
executor to reap the container process, and once the container process exits, 
the Docker executor will be notified (but without the actual exit status of the 
container process).

Here is the RR: https://reviews.apache.org/r/65518/

> Docker bug can cause unkillable tasks
> -------------------------------------
>
>                 Key: MESOS-8488
>                 URL: https://issues.apache.org/jira/browse/MESOS-8488
>             Project: Mesos
>          Issue Type: Improvement
>          Components: containerization
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Assignee: Qian Zhang
>            Priority: Major
>              Labels: mesosphere
>
> Due to an [issue on the Moby 
> project|https://github.com/moby/moby/issues/33820], it's possible for Docker 
> versions 1.13 and later to fail to catch a container exit, so that the 
> {{docker run}} command which was used to launch the container will never 
> return. This can lead to the Docker executor becoming stuck in a state where 
> it believes the container is still running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such 
> a state cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will 
> commit suicide if a kill task attempt has not succeeded. However, if we do 
> this we should also ensure that in the case that the container was actually 
> still running, either the Docker daemon or the DockerContainerizer would 
> clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the 
> container's Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to