[ https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363428#comment-16363428 ]
Qian Zhang commented on MESOS-8488: ----------------------------------- commit a7714536fad1140fd0c07c47e32b40e9ed00a3c3 Author: Qian Zhang Date: Mon Feb 5 20:42:07 2018 +0800 Reaped the container process directly in Docker executor. Due to a Docker issue (https://github.com/moby/moby/issues/33820), Docker daemon can fail to catch a container exit, i.e., the container process has already exited but the command `docker ps` shows the container still running, this will lead to the "docker run" command that we execute in Docker executor never returning, and it will also cause the `docker stop` command takes no effect, i.e., it will return without error but `docker ps` shows the container still running, so the task will stuck in `TASK_KILLING` state. To workaround this Docker issue, in this patch we made Docker executor reaps the container process directly so Docker executor will be notified once the container process exits. Review: https://reviews.apache.org/r/65518 > Docker bug can cause unkillable tasks > ------------------------------------- > > Key: MESOS-8488 > URL: https://issues.apache.org/jira/browse/MESOS-8488 > Project: Mesos > Issue Type: Improvement > Components: containerization > Affects Versions: 1.5.0 > Reporter: Greg Mann > Assignee: Qian Zhang > Priority: Major > Labels: mesosphere > Fix For: 1.6.0 > > > Due to an [issue on the Moby > project|https://github.com/moby/moby/issues/33820], it's possible for Docker > versions 1.13 and later to fail to catch a container exit, so that the > {{docker run}} command which was used to launch the container will never > return. This can lead to the Docker executor becoming stuck in a state where > it believes the container is still running and cannot be killed. > We should update the Docker executor to ensure that containers stuck in such > a state cannot cause unkillable Docker executors/tasks. > One way to do this would be a timeout, after which the Docker executor will > commit suicide if a kill task attempt has not succeeded. However, if we do > this we should also ensure that in the case that the container was actually > still running, either the Docker daemon or the DockerContainerizer would > clean up the container when it does exit. > Another option might be for the Docker executor to directly {{wait()}} on the > container's Linux PID, in order to notice when the container exits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)