Benjamin Mahler created MESOS-759:
-------------------------------------
Summary: The cgroups TaskKiller should skip freezing the cgroup if
it is already empty.
Key: MESOS-759
URL: https://issues.apache.org/jira/browse/MESOS-759
Project: Mesos
Issue Type: Bug
Affects Versions: 0.14.1, 0.14.0, 0.13.0
Reporter: Benjamin Mahler
Assignee: Vinod Kone
Priority: Critical
Fix For: 0.15.0
The current TasksKiller code always freezes the cgroup when trying to kill the
cgroup:
void killTasks() {
// Chain together the steps needed to kill the tasks. Note that we
// ignore the return values of freeze, kill, and thaw because,
// provided there are no errors, we'll just retry the chain as
// long as tasks still exist.
chain = kill(SIGSTOP) // Send stop signal to all
tasks.
.then(defer(self(), &Self::kill, SIGKILL)) // Now send kill signal.
.then(defer(self(), &Self::empty)) // Wait until cgroup is empty.
.then(defer(self(), &Self::freeze)) // Freeze cgroug.
.then(defer(self(), &Self::kill, SIGKILL)) // Send kill signal to any
remaining tasks.
.then(defer(self(), &Self::thaw)) // Thaw cgroup to deliver
signals.
.then(defer(self(), &Self::empty)); // Wait until cgroup is empty.
This should avoid freezing the cgroup, as we've seen instances where the cgroup
is unfreezable and thus this enters a loop attempting to freeze the cgroup as
upon failures we retry this procedure.
--
This message was sent by Atlassian JIRA
(v6.1#6144)