----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14024/#review25994 -----------------------------------------------------------
Ship it! We should definitely enable the OOM killer! I would like to rebase off of your changes here into some changes I've been working on that use memory threshold notifications as a way for us to induce our own "oom". I'll describe what a few of us had discussed here: -> Enable the oom killer, I'll pull in your change here! -> Use memory threshold notifications set to the memory limit. -> When the notification triggers, consider it an OOM and destroy the container. This can still send memory.stat information. -> If a process is allocating quickly enough to trigger the OOM killer, we'll still receive an OOM notification and process it, the downside is that the memory information will not represent the OOM state. This is because a process has been killed once we're notified of the OOM (as you described). Do you see any issues with using memory threshold notifications as well? - Ben Mahler On Sept. 6, 2013, 11:05 p.m., David Mackey wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/14024/ > ----------------------------------------------------------- > > (Updated Sept. 6, 2013, 11:05 p.m.) > > > Review request for mesos, Benjamin Hindman, Ben Mahler, Eric Biederman, and > Vinod Kone. > > > Bugs: MESOS-662 > https://issues.apache.org/jira/browse/MESOS-662 > > > Repository: mesos-git > > > Description > ------- > > I post this partially as a RFC. I'm in favor of this approach but happy to > have the discussion here. > > The Mesos userspace OOM handler does not conform to the practical > restrictions imposed upon it given the potential states the kernel can > be in when it gets the OOM notification. The result of this has been > numerous deadlocks because the Mesos OOM handler blocks on a lock that > is being held by the task it is trying to kill. > > This patch does not try to fix the issues with the OOM handler. Instead, > it hands over the job of OOM-killing to the kernel. The end result is > very similar. The downside to this approach compared to the approach > it's moving away from is now when the Mesos OOM handler reads the > memory.stats they will be after the oom condition occurred. The "maximum > usage" is still captured but the breakdown is lost. This exposes another > weakness in the memcg implementation regarding page cache awareness. > However, the reliability improvements outweigh the weakness in stats. > > > Diffs > ----- > > src/linux/cgroups.hpp 5ee64d6 > src/linux/cgroups.cpp 813dcb3 > src/slave/cgroups_isolator.cpp a1f5b32 > > Diff: https://reviews.apache.org/r/14024/diff/ > > > Testing > ------- > > > Thanks, > > David Mackey > >
