-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14024/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Ben Mahler, Eric Biederman, and 
Vinod Kone.


Bugs: MESOS-662
    https://issues.apache.org/jira/browse/MESOS-662


Repository: mesos-git


Description
-------

I post this partially as a RFC. I'm in favor of this approach but happy to have 
the discussion here.

The Mesos userspace OOM handler does not conform to the practical
restrictions imposed upon it given the potential states the kernel can
be in when it gets the OOM notification. The result of this has been
numerous deadlocks because the Mesos OOM handler blocks on a lock that
is being held by the task it is trying to kill.

This patch does not try to fix the issues with the OOM handler. Instead,
it hands over the job of OOM-killing to the kernel. The end result is
very similar. The downside to this approach compared to the approach
it's moving away from is now when the Mesos OOM handler reads the
memory.stats they will be after the oom condition occurred. The "maximum
usage" is still captured but the breakdown is lost. This exposes another
weakness in the memcg implementation regarding page cache awareness.
However, the reliability improvements outweigh the weakness in stats.


Diffs
-----

  src/linux/cgroups.hpp 5ee64d6 
  src/linux/cgroups.cpp 813dcb3 
  src/slave/cgroups_isolator.cpp a1f5b32 

Diff: https://reviews.apache.org/r/14024/diff/


Testing
-------


Thanks,

David Mackey

Reply via email to