Ian Downes created MESOS-2105:
---------------------------------
Summary: Reliably report OOM even if the executor exits normally
Key: MESOS-2105
URL: https://issues.apache.org/jira/browse/MESOS-2105
Project: Mesos
Issue Type: Improvement
Components: isolation
Affects Versions: 0.20.0
Reporter: Ian Downes
Container OOMs are asynchronously reported by the kernel and the following
sequence can occur:
1) Container OOMs
2) Kernel chooses to kill the task
3) Executor notices, reports TASK_FAILED, then exits
4) MesosContainerizer sees executor exit, *doesn't check for an OOM*, and
destroys the container
5) Memory isolator may or may not have seen the OOM event but the container is
destroyed anyway.
The task is reported to have failed but without including the cause.
Suggest always checking if an OOM has occurred, even if the executor exits
normally.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)