[
https://issues.apache.org/jira/browse/MESOS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120864#comment-15120864
]
Gilbert Song commented on MESOS-3573:
-------------------------------------
Hi chenqiang, I may be able to look at this issue again this weekend. Do you
get into the same issue? If there is any slight difference, it might be helpful
to attaching your log.
About this issue, I did go through each commit between 0.23 and 0.24, but I did
not see anything part changed related to this issue. However, it should be
something wrong in `recover` logic.
> Mesos does not kill orphaned docker containers
> ----------------------------------------------
>
> Key: MESOS-3573
> URL: https://issues.apache.org/jira/browse/MESOS-3573
> Project: Mesos
> Issue Type: Bug
> Components: docker, slave
> Reporter: Ian Babrou
> Assignee: haosdent
> Priority: Blocker
> Labels: mesosphere
>
> After upgrade to 0.24.0 we noticed hanging containers appearing. Looks like
> there were changes between 0.23.0 and 0.24.0 that broke cleanup.
> Here's how to trigger this bug:
> 1. Deploy app in docker container.
> 2. Kill corresponding mesos-docker-executor process
> 3. Observe hanging container
> Here are the logs after kill:
> {noformat}
> slave_1 | I1002 12:12:59.362002 7791 docker.cpp:1576] Executor for
> container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' has exited
> slave_1 | I1002 12:12:59.362284 7791 docker.cpp:1374] Destroying
> container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8'
> slave_1 | I1002 12:12:59.363404 7791 docker.cpp:1478] Running docker stop
> on container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8'
> slave_1 | I1002 12:12:59.363876 7791 slave.cpp:3399] Executor
> 'sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c' of framework
> 20150923-122130-2153451692-5050-1-0000 terminated with signal Terminated
> slave_1 | I1002 12:12:59.367570 7791 slave.cpp:2696] Handling status
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000 from @0.0.0.0:0
> slave_1 | I1002 12:12:59.367842 7791 slave.cpp:5094] Terminating task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c
> slave_1 | W1002 12:12:59.368484 7791 docker.cpp:986] Ignoring updating
> unknown container: f083aaa2-d5c3-43c1-b6ba-342de8829fa8
> slave_1 | I1002 12:12:59.368671 7791 status_update_manager.cpp:322]
> Received status update TASK_FAILED (UUID:
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000
> slave_1 | I1002 12:12:59.368741 7791 status_update_manager.cpp:826]
> Checkpointing UPDATE for status update TASK_FAILED (UUID:
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000
> slave_1 | I1002 12:12:59.370636 7791 status_update_manager.cpp:376]
> Forwarding update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550)
> for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000 to the slave
> slave_1 | I1002 12:12:59.371335 7791 slave.cpp:2975] Forwarding the
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000 to [email protected]:5050
> slave_1 | I1002 12:12:59.371908 7791 slave.cpp:2899] Status update
> manager successfully handled status update TASK_FAILED (UUID:
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000
> master_1 | I1002 12:12:59.372047 11 master.cpp:4069] Status update
> TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000 from slave
> 20151002-120829-2153451692-5050-1-S0 at slave(1)@172.16.91.128:5051
> (172.16.91.128)
> master_1 | I1002 12:12:59.372534 11 master.cpp:4108] Forwarding status
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000
> master_1 | I1002 12:12:59.373018 11 master.cpp:5576] Updating the latest
> state of task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework
> 20150923-122130-2153451692-5050-1-0000 to TASK_FAILED
> master_1 | I1002 12:12:59.373447 11 hierarchical.hpp:814] Recovered
> cpus(*):0.1; mem(*):16; ports(*):[31685-31685] (total: cpus(*):4;
> mem(*):1001; disk(*):52869; ports(*):[31000-32000], allocated:
> cpus(*):8.32667e-17) on slave 20151002-120829-2153451692-5050-1-S0 from
> framework 20150923-122130-2153451692-5050-1-0000
> {noformat}
> Another issue: if you restart mesos-slave on the host with orphaned docker
> containers, they are not getting killed. This was the case before and I hoped
> for this trick to kill hanging containers, but it doesn't work now.
> Marking this as critical because it hoards cluster resources and blocks
> scheduling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)