[
https://issues.apache.org/jira/browse/MESOS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Reiss reassigned MESOS-47:
----------------------------------
Assignee: Benjamin Hindman
Ben has apparently started working on this.
> Kill entire containers on OOM with LXC isolation module
> -------------------------------------------------------
>
> Key: MESOS-47
> URL: https://issues.apache.org/jira/browse/MESOS-47
> Project: Mesos
> Issue Type: Improvement
> Components: isolation
> Environment: Linux with container-based isolation
> Reporter: Charles Reiss
> Assignee: Benjamin Hindman
> Labels: lxc
>
> When using the LXC isolation module, the kernel OOM killer will kill a victim
> process in the container when the container exceeds its memory limit. When
> the container contains multiple processes this can cause weird failures.
> Instead, Mesos should use the memory cgroup's oom_control feature to disable
> OOM kills (instead, processes requesting memory block) and have the slave be
> informed of OOM events using an eventfd. When the slave receives OOM messages
> on this event fd, it should kill all processes in the over-limit executor's
> container.
> (These OOM events only happen when a container exceeds its hard memory limit.
> If Mesos does overcommit of memory in the future, then it should have a outer
> cgroup with memory hard limits and memory.use_hierarchy enabled on which to
> get OOM events (so they don't turn into global OOM kills). Mesos will need to
> have code to figure out which executors are exceeding their "soft" memory
> limits and choose a victim executor.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira