[ 
https://issues.apache.org/jira/browse/MESOS-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202956#comment-14202956
 ] 

Ian Downes commented on MESOS-2052:
-----------------------------------

https://reviews.apache.org/r/27738/

> RunState::recover should always recover 'completed'
> ---------------------------------------------------
>
>                 Key: MESOS-2052
>                 URL: https://issues.apache.org/jira/browse/MESOS-2052
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, slave
>    Affects Versions: 0.20.0
>            Reporter: Ian Downes
>            Assignee: Ian Downes
>
> RunState::recover() will return partial state if it cannot find or open the 
> libprocess pid file. Specifically, it does not recover the 'completed' flag.
> However, if the slave has removed the executor (because launch failed or the 
> executor failed to register) the sentinel flag will be set and this fact 
> should be recovered. This ensures that container recovery is not attempted 
> later.
> This was discovered when the LinuxLauncher failed to recover because it was 
> asked to recover two containers with the same forkedPid. Investigation showed 
> the executors both OOM'ed before registering, i.e., no libprocess pid file 
> was present. However, the containerizer had detected the OOM, destroyed the 
> container, and notified the slave which cleaned everything up: failing the 
> task and calling removeExecutor (which writes the completed sentinel file.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to