[
https://issues.apache.org/jira/browse/MESOS-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Chen reassigned MESOS-2601:
-----------------------------------
Assignee: Timothy Chen
> Tasks are not removed after recovery from slave and mesos containerizer
> -----------------------------------------------------------------------
>
> Key: MESOS-2601
> URL: https://issues.apache.org/jira/browse/MESOS-2601
> Project: Mesos
> Issue Type: Bug
> Components: containerization, slave
> Affects Versions: 0.22.1
> Reporter: Timothy Chen
> Assignee: Timothy Chen
>
> We've seen in our test cluster that tasks that were launched with the mesos
> containerizer are recovered after slave restart, but actual command process
> is not running anymore and the checkpointed executor is not marked as
> completed.
> The Mesos containerizer recovers and all the isolators couldn't recover the
> task, but the containerizer itself is somehow never removed and the monitor
> kept calling usage on the containerizer.
> Relevant log lines from the beginning of slave recovery:
> I0408 18:06:33.261379 32504 slave.cpp:577] Successfully attached file
> '/hdd/mesos/slave/slaves/20150401-160104-251662508-5050-2197-S1/frameworks/20141222-194154-218108076-5050-4125-0004/executors/ct:1427921848104:0:EM
> DataDog Uploader:/runs/990741ed-909e-49cc-83f8-be63298872da'
> ...
> I0408 18:06:36.583277 32511 containerizer.cpp:350] Recovering container
> '990741ed-909e-49cc-83f8-be63298872da' for executor 'ct:1427921848104:0:EM
> DataDog Uploader:' of framework 20141222-194154-218108076-5050-4125-0004
> ....
> I0408 18:06:37.017122 32511 linux_launcher.cpp:162] Couldn't find freezer
> cgroup for container 990741ed-909e-49cc-83f8-be63298872da, assuming already
> destroyed
> W0408 18:06:37.074916 32496 cpushare.cpp:199] Couldn't find cgroup for
> container 990741ed-909e-49cc-83f8-be63298872da
> I0408 18:06:37.075173 32486 mem.cpp:158] Couldn't find cgroup for container
> 990741ed-909e-49cc-83f8-be63298872da
> E0408 18:06:37.092279 32496 containerizer.cpp:1136] Error in a resource
> limitation for container 990741ed-909e-49cc-83f8-be63298872da: Unknown
> container
> I0408 18:06:37.092643 32496 containerizer.cpp:906] Destroying container
> '990741ed-909e-49cc-83f8-be63298872da'
> W0408 18:06:37.229626 32501 containerizer.cpp:807] Ignoring update for
> currently being destroyed container: 990741ed-909e-49cc-83f8-be63298872da
> W0408 18:06:38.129873 32484 containerizer.cpp:844] Skipping resource
> statistic for container 990741ed-909e-49cc-83f8-be63298872da because: Unknown
> container
> W0408 18:06:38.129909 32484 containerizer.cpp:844] Skipping resource
> statistic for container 990741ed-909e-49cc-83f8-be63298872da because: Unknown
> container
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)