Andrew Schwartzmeyer commented on MESOS-8488:

commit 1daf6cb03
Author: Akash Gupta akash-gu...@hotmail.com
Date:   Sun Feb 25 13:37:42 2018 -0800
Windows: Fixed flaky Docker command health check test.

The `DockerContainerizerHealthCheckTest.ROOT_DOCKER_
DockerHealthStatusChange` test was flaky on Windows, because
the Docker executor manually reaps the container exit code in
case that `docker run` fails to get the exit code. This logic
doesn't work on Windows, since the process might not be visible to
the container host machine, causing `TASK_FAILED` to get sent. By
removing the reaping logic on Windows, the test is much more reliable.

Review: https://reviews.apache.org/r/65733/

> Docker bug can cause unkillable tasks.
> --------------------------------------
>                 Key: MESOS-8488
>                 URL: https://issues.apache.org/jira/browse/MESOS-8488
>             Project: Mesos
>          Issue Type: Improvement
>          Components: containerization
>    Affects Versions: 1.5.0
>            Reporter: Greg Mann
>            Assignee: Qian Zhang
>            Priority: Major
>              Labels: mesosphere
>             Fix For: 1.6.0
> Due to an [issue on the Moby 
> project|https://github.com/moby/moby/issues/33820], it's possible for Docker 
> versions 1.13 and later to fail to catch a container exit, so that the 
> {{docker run}} command which was used to launch the container will never 
> return. This can lead to the Docker executor becoming stuck in a state where 
> it believes the container is still running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such 
> a state cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will 
> commit suicide if a kill task attempt has not succeeded. However, if we do 
> this we should also ensure that in the case that the container was actually 
> still running, either the Docker daemon or the DockerContainerizer would 
> clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the 
> container's Linux PID, in order to notice when the container exits.

This message was sent by Atlassian JIRA

Reply via email to