[ 
https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363428#comment-16363428
 ] 

Qian Zhang commented on MESOS-8488:
-----------------------------------

commit a7714536fad1140fd0c07c47e32b40e9ed00a3c3
Author: Qian Zhang 
Date: Mon Feb 5 20:42:07 2018 +0800

Reaped the container process directly in Docker executor.
 
 Due to a Docker issue (https://github.com/moby/moby/issues/33820),
 Docker daemon can fail to catch a container exit, i.e., the container
 process has already exited but the command `docker ps` shows the
 container still running, this will lead to the "docker run" command
 that we execute in Docker executor never returning, and it will also
 cause the `docker stop` command takes no effect, i.e., it will return
 without error but `docker ps` shows the container still running, so
 the task will stuck in `TASK_KILLING` state.
 
 To workaround this Docker issue, in this patch we made Docker executor
 reaps the container process directly so Docker executor will be notified
 once the container process exits.
 
 Review: https://reviews.apache.org/r/65518

> Docker bug can cause unkillable tasks
> -------------------------------------
>
>                 Key: MESOS-8488
>                 URL: https://issues.apache.org/jira/browse/MESOS-8488
>             Project: Mesos
>          Issue Type: Improvement
>          Components: containerization
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Assignee: Qian Zhang
>            Priority: Major
>              Labels: mesosphere
>             Fix For: 1.6.0
>
>
> Due to an [issue on the Moby 
> project|https://github.com/moby/moby/issues/33820], it's possible for Docker 
> versions 1.13 and later to fail to catch a container exit, so that the 
> {{docker run}} command which was used to launch the container will never 
> return. This can lead to the Docker executor becoming stuck in a state where 
> it believes the container is still running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such 
> a state cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will 
> commit suicide if a kill task attempt has not succeeded. However, if we do 
> this we should also ensure that in the case that the container was actually 
> still running, either the Docker daemon or the DockerContainerizer would 
> clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the 
> container's Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to