Benno Evers created MESOS-8704:
----------------------------------

             Summary: Removing `work_dir` can trigger assertion failure in the 
mesos containerizer
                 Key: MESOS-8704
                 URL: https://issues.apache.org/jira/browse/MESOS-8704
             Project: Mesos
          Issue Type: Bug
            Reporter: Benno Evers


This was reported to me by [~jeschkies], so I might be missing some details.

After starting a Mesos agent with the flag `–containerizer=mesos,docker` and 
using Marathon to run a task group on this agent, then stopping the agent and 
removing the `work_dir` folder, and then restarting the agent with the flag 
`–containerizer=mesos` leads to the following crash during recovery:
{noformat}
I0319 15:58:03.865108 121364480 containerizer.cpp:674] Recovering containerizer
F0319 15:58:03.867717 121364480 containerizer.cpp:919] 
CHECK_SOME(container->directory): is NONE
*** Check failure stack trace: ***{noformat}
After a reboot, things seemed to be working fine again.

 

Since we're reading container id's from `runtime_dir` during recovery, and that 
wasn't cleaned between agent restarts, it seems like we're missing some 
validation for the case where the agent restarts from a half-dirty state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to