[
https://issues.apache.org/jira/browse/MESOS-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486043#comment-14486043
]
Ian Downes commented on MESOS-2601:
-----------------------------------
Can you please explain further:
"The Mesos containerizer recovers and all the isolators couldn't recover the
task"
Are you saying that recovery is not successful? If any isolator fails to
recover then the containerizer should fail to recover and the slave shouldn't
start...
Recovery should be successful even if the executor has exited, it'll just be a
slight delay until the reaper polls on the executor's pid and notices it has
exited . This is not perfect in that it's possible that while a slave is
restarting an executor could exit and another process starts with the same pid.
The reaper, as implemented now, will not detect that and thus the containerizer
won't realize the executor has terminated.
> Tasks are not removed after recovery from slave and mesos containerizer
> -----------------------------------------------------------------------
>
> Key: MESOS-2601
> URL: https://issues.apache.org/jira/browse/MESOS-2601
> Project: Mesos
> Issue Type: Bug
> Components: containerization, slave
> Affects Versions: 0.22.1
> Reporter: Timothy Chen
>
> We've seen in our test cluster that tasks that were launched with the mesos
> containerizer are recovered after slave restart, but actual command process
> is not running anymore and the checkpointed executor is not marked as
> completed.
> The Mesos containerizer recovers and all the isolators couldn't recover the
> task, but the containerizer itself is somehow never removed and the monitor
> kept calling usage on the containerizer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)