Sargun Dhillon created MESOS-8716:
-------------------------------------

             Summary: Freezer controller is not returned to thaw if task 
termination fails
                 Key: MESOS-8716
                 URL: https://issues.apache.org/jira/browse/MESOS-8716
             Project: Mesos
          Issue Type: Bug
          Components: agent, containerization
    Affects Versions: 1.3.2
            Reporter: Sargun Dhillon


This issue is related to https://issues.apache.org/jira/browse/MESOS-8004. A 
container may fail to terminate for a variety of reasons. One common reason in 
our system is when containers rely on external storage, they run fsync before 
exiting (fsync on SIGTERM). This makes it so that the termination can timeout. 

 

Even though Mesos has sent the requisite kill signals, the task will never 
terminate because the cgroup stays frozen. 

 

The intended behaviour should be that on failure to terminate, if the pids 
isolator is running, pids.max should be set to 0, to prevent further processes 
from being created, the cgroup should be walked and sigkilled, and then thawed. 
Once the processes finish thawing, the kill signal will be delivered, and 
processed, resulting in the container finally finishing,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to