Benno Evers created MESOS-8704:
----------------------------------
Summary: Removing `work_dir` can trigger assertion failure in the
mesos containerizer
Key: MESOS-8704
URL: https://issues.apache.org/jira/browse/MESOS-8704
Project: Mesos
Issue Type: Bug
Reporter: Benno Evers
This was reported to me by [~jeschkies], so I might be missing some details.
After starting a Mesos agent with the flag `–containerizer=mesos,docker` and
using Marathon to run a task group on this agent, then stopping the agent and
removing the `work_dir` folder, and then restarting the agent with the flag
`–containerizer=mesos` leads to the following crash during recovery:
{noformat}
I0319 15:58:03.865108 121364480 containerizer.cpp:674] Recovering containerizer
F0319 15:58:03.867717 121364480 containerizer.cpp:919]
CHECK_SOME(container->directory): is NONE
*** Check failure stack trace: ***{noformat}
After a reboot, things seemed to be working fine again.
Since we're reading container id's from `runtime_dir` during recovery, and that
wasn't cleaned between agent restarts, it seems like we're missing some
validation for the case where the agent restarts from a half-dirty state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)