[
https://issues.apache.org/jira/browse/MESOS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804330#comment-16804330
]
Greg Mann commented on MESOS-9191:
----------------------------------
I just did some local testing which is relevant to this ticket. It seems that
discarding the {{docker stop}} process (in other words, terminating the process
running the {{docker stop}} command) is not sufficient to cancel the pending
SIGKILL.
I ran a simple shell script in a docker container which trapped SIGTERM. If I
issue {{docker stop -t X <container_id>}} at time T0 and then terminate the
{{docker stop}} process, the SIGKILL is still sent by the docker daemon at time
T0+X. Similarly, if I issue the docker stop command at time T0, terminate it,
and then issue another docker stop command at time T1, the SIGKILL is still
issued by the docker daemon at time T0+X.
This means that this kill task retry loop issue is not as bad as expected. If
the docker daemon is behaving well, issuing a single {{docker stop}} is
sufficient to send the SIGKILL after the grace period, regardless of whether we
terminate the docker stop process or not.
> Docker command executor may stuck at infinite unkillable loop.
> --------------------------------------------------------------
>
> Key: MESOS-9191
> URL: https://issues.apache.org/jira/browse/MESOS-9191
> Project: Mesos
> Issue Type: Bug
> Components: containerization, docker
> Reporter: Gilbert Song
> Assignee: Andrei Budnik
> Priority: Major
> Labels: containerizer
>
> Due to the change from https://issues.apache.org/jira/browse/MESOS-8574, the
> behavior of docker command executor to discard the future of docker stop was
> changed. If there is a new killTask() invoked and there is an existing docker
> stop in pending state, the old one would call discard and then execute the
> new one. This is ok for most of cases.
> However, docker stop could take long (depends on grace period and whether the
> application could handle SIGTERM). If the framework retry killTask more
> frequently than grace period (depends on killpolicy API, env var, or agent
> flags), then the executor may be stuck forever with unkillable tasks. Because
> everytime before the docker stop finishes, the future of docker stop is
> discarded by the new incoming killTask.
> We should consider re-use grace period before calling discard() to a pending
> docker stop future.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)