[
https://issues.apache.org/jira/browse/MESOS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289484#comment-14289484
]
Timothy Chen commented on MESOS-2115:
-------------------------------------
I'm posting the motivations and a proposed solution into this google doc,
please take a look if anyone is interested.
https://docs.google.com/a/mesosphere.io/document/d/1_1oLHXg_aHj_fYCzsjYwox9xvIYNAKIeVjO5BFxsUGI/edit#
> Improve recovering Docker containers when slave is contained
> ------------------------------------------------------------
>
> Key: MESOS-2115
> URL: https://issues.apache.org/jira/browse/MESOS-2115
> Project: Mesos
> Issue Type: Epic
> Components: docker
> Reporter: Timothy Chen
> Assignee: Timothy Chen
> Labels: docker
>
> Currently when docker containerizer is recovering it checks the checkpointed
> executor pids to recover which containers are still running, and remove the
> rest of the containers from docker ps that isn't recognized.
> This is problematic when the slave itself was in a docker container, as when
> the slave container dies all the forked processes are removed as well, so the
> checkpointed executor pids are no longer valid.
> We have to assume the docker containers might be still running even though
> the checkpointed executor pids are not.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)