[ https://issues.apache.org/jira/browse/MESOS-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928365#comment-16928365 ]
Benno Evers commented on MESOS-9960: ------------------------------------ [~gilbert], I'm not sure I follow, why do you think it should be closed? > Agent with cgroup support may destroy containers belonging to unrelated > agents on startup > ----------------------------------------------------------------------------------------- > > Key: MESOS-9960 > URL: https://issues.apache.org/jira/browse/MESOS-9960 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 1.8.1, 1.9.0, master > Reporter: Benno Evers > Priority: Major > > Let's say I have a mesos cluster with one master and one agent: > {noformat} > $ mesos-master --work_dir=/tmp/mesos-master > $ sudo mesos-agent --work_dir=/tmp/mesos-agent --master=127.0.1.1:5050 > --port=5052 --isolation=docker/runtime > {noformat} > where I'm running a simple sleep task: > {noformat} > $ mesos-execute --command="sleep 10000" --master=127.0.1.1:5050 --name="sleep" > I0904 18:40:25.020413 18321 scheduler.cpp:189] Version: 1.8.0 > I0904 18:40:25.020892 18319 scheduler.cpp:342] Using default 'basic' HTTP > authenticatee > I0904 18:40:25.021039 18323 scheduler.cpp:525] New master detected at > master@127.0.1.1:5050 > Subscribed with ID 7d9f5030-cadd-49df-bf1e-daa97a4baab6-0000 > Submitted task 'sleep' to agent 'd59e934c-9e26-490d-9f4a-1e8b4ce06b4e-S1' > Received status update TASK_STARTING for task 'sleep' > source: SOURCE_EXECUTOR > Received status update TASK_RUNNING for task 'sleep' > source: SOURCE_EXECUTOR > {noformat} > Next, I start a second agent on the same host as the first one: > {noformat} > $ sudo ./src/mesos-agent --work_dir=/tmp/yyyy --master=example.org:5050 > --isolation="linux/seccomp" > --seccomp_config_dir=`pwd`/3rdparty/libseccomp-2.3.3 > {noformat} > During startup, this agent detects the container belonging to the other, > unrelated agent and will attempt to clean it up: > {noformat} > 0904 18:30:44.906430 18067 task_status_update_manager.cpp:207] Recovering > task status update manager > I0904 18:30:44.906913 18071 containerizer.cpp:797] Recovering Mesos containers > I0904 18:30:44.910077 18070 linux_launcher.cpp:286] Recovering Linux launcher > I0904 18:30:44.910347 18070 linux_launcher.cpp:343] Recovered container > 7f455ed7-6593-41e8-9b29-52ee84d7675b > I0904 18:30:44.910409 18070 linux_launcher.cpp:437] > 7f455ed7-6593-41e8-9b29-52ee84d7675b is a known orphaned container > I0904 18:30:44.910877 18065 containerizer.cpp:1123] Recovering isolators > I0904 18:30:44.911888 18064 containerizer.cpp:1162] Recovering provisioner > I0904 18:30:44.913368 18068 provisioner.cpp:498] Provisioner recovery complete > I0904 18:30:44.913630 18065 containerizer.cpp:1234] Cleaning up orphan > container 7f455ed7-6593-41e8-9b29-52ee84d7675b > I0904 18:30:44.913656 18065 containerizer.cpp:2576] Destroying container > 7f455ed7-6593-41e8-9b29-52ee84d7675b in RUNNING state > I0904 18:30:44.913666 18065 containerizer.cpp:3278] Transitioning the state > of container 7f455ed7-6593-41e8-9b29-52ee84d7675b from RUNNING to DESTROYING > I0904 18:30:44.914687 18064 linux_launcher.cpp:576] Asked to destroy > container 7f455ed7-6593-41e8-9b29-52ee84d7675b > I0904 18:30:44.914788 18064 linux_launcher.cpp:618] Destroying cgroup > '/sys/fs/cgroup/freezer/mesos/7f455ed7-6593-41e8-9b29-52ee84d7675b' > {noformat} > killing the sleep task in the process: > {noformat} > Received status update TASK_FAILED for task 'sleep' > message: 'Executor terminated' > source: SOURCE_AGENT > reason: REASON_EXECUTOR_TERMINATED > {noformat} > After some additional testing, it seems like the value of the `--isolation` > flag is actually irrelevant: The same behaviour can be observed as long as > cgroup support is enabled with `--systemd_enable_support`. -- This message was sent by Atlassian Jira (v8.3.2#803003)