Sargun Dhillon created MESOS-8716:
-------------------------------------
Summary: Freezer controller is not returned to thaw if task
termination fails
Key: MESOS-8716
URL: https://issues.apache.org/jira/browse/MESOS-8716
Project: Mesos
Issue Type: Bug
Components: agent, containerization
Affects Versions: 1.3.2
Reporter: Sargun Dhillon
This issue is related to https://issues.apache.org/jira/browse/MESOS-8004. A
container may fail to terminate for a variety of reasons. One common reason in
our system is when containers rely on external storage, they run fsync before
exiting (fsync on SIGTERM). This makes it so that the termination can timeout.
Even though Mesos has sent the requisite kill signals, the task will never
terminate because the cgroup stays frozen.
The intended behaviour should be that on failure to terminate, if the pids
isolator is running, pids.max should be set to 0, to prevent further processes
from being created, the cgroup should be walked and sigkilled, and then thawed.
Once the processes finish thawing, the kill signal will be delivered, and
processed, resulting in the container finally finishing,
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)