Jan Schlicht created MESOS-9131:
-----------------------------------

             Summary: Health checks launching nested containers while a 
container is being destroyed lead to unkillable tasks
                 Key: MESOS-9131
                 URL: https://issues.apache.org/jira/browse/MESOS-9131
             Project: Mesos
          Issue Type: Bug
          Components: agent
            Reporter: Jan Schlicht


A container might get stuck in {{DESTROYING}} state if there's a command health 
check that starts new nested containers while its parent container is getting 
destroyed.

Here are some logs which unrelated lines removed. The 
`REMOVE_NESTED_CONTAINER`/`LAUNCH_NESTED_CONTAINER_SESSION` keeps looping 
afterwards.
{noformat}
2018-04-16 12:37:54: I0416 12:37:54.235877  3863 containerizer.cpp:2807] 
Container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 has 
exited
2018-04-16 12:37:54: I0416 12:37:54.235914  3863 containerizer.cpp:2354] 
Destroying container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 in 
RUNNING state
2018-04-16 12:37:54: I0416 12:37:54.235932  3863 containerizer.cpp:2968] 
Transitioning the state of container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 from 
RUNNING to DESTROYING
2018-04-16 12:37:54: I0416 12:37:54.236100  3852 linux_launcher.cpp:514] Asked 
to destroy container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.e6e01854-40a0-4da3-b458-2b4cf52bbc11
2018-04-16 12:37:54: I0416 12:37:54.237671  3852 linux_launcher.cpp:560] Using 
freezer to destroy cgroup 
mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
2018-04-16 12:37:54: I0416 12:37:54.240327  3852 cgroups.cpp:3060] Freezing 
cgroup 
/sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
2018-04-16 12:37:54: I0416 12:37:54.244179  3852 cgroups.cpp:1415] Successfully 
froze cgroup 
/sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
 after 3.814144ms
2018-04-16 12:37:54: I0416 12:37:54.250550  3853 cgroups.cpp:3078] Thawing 
cgroup 
/sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
2018-04-16 12:37:54: I0416 12:37:54.256599  3853 cgroups.cpp:1444] Successfully 
thawed cgroup 
/sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
 after 5.977856ms
...
2018-04-16 12:37:54: I0416 12:37:54.371117  3837 http.cpp:3502] Processing 
LAUNCH_NESTED_CONTAINER_SESSION call for container 
'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd'
2018-04-16 12:37:54: W0416 12:37:54.371692  3842 http.cpp:2758] Failed to 
launch container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd:
 Parent container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 is in 
'DESTROYING' state
2018-04-16 12:37:54: W0416 12:37:54.371826  3840 containerizer.cpp:2337] 
Attempted to destroy unknown container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd
...
2018-04-16 12:37:55: I0416 12:37:55.504456  3856 http.cpp:3078] Processing 
REMOVE_NESTED_CONTAINER call for container 
'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-f3a1238c-7f0f-4db3-bda4-c0ea951d46b6'
...
2018-04-16 12:37:55: I0416 12:37:55.556367  3857 http.cpp:3502] Processing 
LAUNCH_NESTED_CONTAINER_SESSION call for container 
'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-0db8bd89-6f19-48c6-a69f-40196b4bc211'
...
2018-04-16 12:37:55: W0416 12:37:55.582137  3850 http.cpp:2758] Failed to 
launch container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-0db8bd89-6f19-48c6-a69f-40196b4bc211:
 Parent container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 is in 
'DESTROYING' state
...
2018-04-16 12:37:55: W0416 12:37:55.583330  3844 containerizer.cpp:2337] 
Attempted to destroy unknown container 
db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-0db8bd89-6f19-48c6-a69f-40196b4bc211
...
{noformat}

This stops when the framework reconciles and instructs Mesos to kill the task. 
Which also results in a
{noformat}
2018-04-16 13:06:04: I0416 13:06:04.161623  3843 http.cpp:2966] Processing 
KILL_NESTED_CONTAINER call for container 
'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133'
{noformat}
Nothing else related to this container is logged following this line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to