Ian Downes created MESOS-2978:
---------------------------------
Summary: Provide more debug information when OOMing a container
Key: MESOS-2978
URL: https://issues.apache.org/jira/browse/MESOS-2978
Project: Mesos
Issue Type: Improvement
Components: isolation
Affects Versions: 0.22.1
Reporter: Ian Downes
Priority: Minor
Currently, the cgroup memory isolator will log the output of {{memory.stat}} if
it detects the container has oom'ed. This information is of some use to see how
different types of memory used contributed to the oom but it does not provide
information about memory usage of specific processes.
We should log process (thread) information, e.g., something to the effect of:
{noformat}
[idownes@foobar]$ pwd
/sys/fs/cgroup/memory/mesos/XXXX
[idownes@foobar]$ cat tasks | xargs ps -o pid,tid,stat,time,rss,command -L -p
{noformat}
This output is of variable size (memory.stat is bounded) so measures should be
taken to limit the amount logged.
Note: the oom notification from the kernel is asynchronous with the kernel's
oom handler killing processes and observing the notification is asynchronous in
Mesos. Logging of information is thus best effort and it may lack information
about process(es) that have already been killed by the kernel or even may not
be logged at all if Mesos reacts first to the executor terminating.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)