Sargun Dhillon created MESOS-8716: ------------------------------------- Summary: Freezer controller is not returned to thaw if task termination fails Key: MESOS-8716 URL: https://issues.apache.org/jira/browse/MESOS-8716 Project: Mesos Issue Type: Bug Components: agent, containerization Affects Versions: 1.3.2 Reporter: Sargun Dhillon
This issue is related to https://issues.apache.org/jira/browse/MESOS-8004. A container may fail to terminate for a variety of reasons. One common reason in our system is when containers rely on external storage, they run fsync before exiting (fsync on SIGTERM). This makes it so that the termination can timeout. Even though Mesos has sent the requisite kill signals, the task will never terminate because the cgroup stays frozen. The intended behaviour should be that on failure to terminate, if the pids isolator is running, pids.max should be set to 0, to prevent further processes from being created, the cgroup should be walked and sigkilled, and then thawed. Once the processes finish thawing, the kill signal will be delivered, and processed, resulting in the container finally finishing, -- This message was sent by Atlassian JIRA (v7.6.3#76005)