[
https://issues.apache.org/jira/browse/MESOS-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Chen reassigned MESOS-1915:
-----------------------------------
Assignee: Timothy Chen
> Docker containers that fail to launch are not killed
> ----------------------------------------------------
>
> Key: MESOS-1915
> URL: https://issues.apache.org/jira/browse/MESOS-1915
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Affects Versions: 0.20.1
> Environment: Mesos 0.20.1 using the docker executor with a private
> docker repository. Images often take up to 5 minutes to launch.
> /etc/mesos-slave/executor_registration_timeout is set to '10mins'
> Reporter: Daniel Hall
> Assignee: Timothy Chen
>
> When we launch docker containers on our Mesos cluster using marathon we have
> noticed that we end up with several docker containers running, with only one
> of them actually being tracked my Mesos. When inspected the containers both
> have the same start time.
> This seems to be because Mesos gives up on trying to start the container
> after 1min, but fails to clean up the docker container because it is is not
> yet running. Eventually the container starts alongside all the other attempts
> mesos has made and we end up with several containers running with only one
> being tracked by Mesos.
> I've pasted some logs from the slave below filter for that particular task,
> but it is pretty easy to replicate in our environment so I'm happy to provide
> further logs, details and analysis as required. This is becoming a bit
> problem for us so we are happy to help as much as possible.
> {noformat}
> Oct 13 04:47:42 mesosslave-1 mesos-slave[16647]: I1013 04:47:42.776945 16661
> docker.cpp:743] Starting container 'dd113461-4d18-4170-8e3f-9527e6d7f598' for
> task 'docker-test.11588a48-5294-11e4-adea-42010af0f51e' (and executor
> 'docker-test.11588a48-5294-11e4-adea-42010af0f51e') of framework
> '20140918-022627-519434250-5050-6171-0000'
> Oct 13 04:48:42 mesosslave-1 mesos-slave[16647]: E1013 04:48:42.819563 16664
> slave.cpp:2205] Failed to update resources for container
> dd113461-4d18-4170-8e3f-9527e6d7f598 of executor
> docker-test.11588a48-5294-11e4-adea-42010af0f51e running task
> docker-test.11588a48-5294-11e4-adea-42010af0f51e on status update for
> terminal task, destroying container: No container found
> Oct 13 04:49:29 mesosslave-1 mesos-slave[16647]: I1013 04:49:29.916460 16665
> slave.cpp:2538] Monitoring executor
> 'docker-test.11588a48-5294-11e4-adea-42010af0f51e' of framework
> '20140918-022627-519434250-5050-6171-0000' in container
> 'dd113461-4d18-4170-8e3f-9527e6d7f598'
> Oct 13 04:49:31 mesosslave-1 mesos-slave[16647]: I1013 04:49:31.103175 16663
> docker.cpp:1286] Updated 'cpu.shares' to 102 at
> /cgroup/cpu/docker/6a581f5c2174dc76bcfb2e5b89fd9a4310732c384d93901a8b37da8aeb700468
> for container dd113461-4d18-4170-8e3f-9527e6d7f598
> Oct 13 04:49:31 mesosslave-1 mesos-slave[16647]: I1013 04:49:31.105036 16663
> docker.cpp:1321] Updated 'memory.soft_limit_in_bytes' to 32MB for container
> dd113461-4d18-4170-8e3f-9527e6d7f598
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)