Ian Downes created MESOS-2105:
---------------------------------

             Summary: Reliably report OOM even if the executor exits normally
                 Key: MESOS-2105
                 URL: https://issues.apache.org/jira/browse/MESOS-2105
             Project: Mesos
          Issue Type: Improvement
          Components: isolation
    Affects Versions: 0.20.0
            Reporter: Ian Downes


Container OOMs are asynchronously reported by the kernel and the following 
sequence can occur:
1) Container OOMs
2) Kernel chooses to kill the task
3) Executor notices, reports TASK_FAILED, then exits
4) MesosContainerizer sees executor exit, *doesn't check for an OOM*, and 
destroys the container
5) Memory isolator may or may not have seen the OOM event but the container is 
destroyed anyway.

The task is reported to have failed but without including the cause.

Suggest always checking if an OOM has occurred, even if the executor exits 
normally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to