-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14024/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman, Ben Mahler, Eric Biederman, and
Vinod Kone.
Bugs: MESOS-662
https://issues.apache.org/jira/browse/MESOS-662
Repository: mesos-git
Description
-------
I post this partially as a RFC. I'm in favor of this approach but happy to have
the discussion here.
The Mesos userspace OOM handler does not conform to the practical
restrictions imposed upon it given the potential states the kernel can
be in when it gets the OOM notification. The result of this has been
numerous deadlocks because the Mesos OOM handler blocks on a lock that
is being held by the task it is trying to kill.
This patch does not try to fix the issues with the OOM handler. Instead,
it hands over the job of OOM-killing to the kernel. The end result is
very similar. The downside to this approach compared to the approach
it's moving away from is now when the Mesos OOM handler reads the
memory.stats they will be after the oom condition occurred. The "maximum
usage" is still captured but the breakdown is lost. This exposes another
weakness in the memcg implementation regarding page cache awareness.
However, the reliability improvements outweigh the weakness in stats.
Diffs
-----
src/linux/cgroups.hpp 5ee64d6
src/linux/cgroups.cpp 813dcb3
src/slave/cgroups_isolator.cpp a1f5b32
Diff: https://reviews.apache.org/r/14024/diff/
Testing
-------
Thanks,
David Mackey