[ 
https://issues.apache.org/jira/browse/MESOS-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254321#comment-15254321
 ] 

Gilbert Song commented on MESOS-5238:
-------------------------------------

This bug is because of a race in mesos containerizer. From the agent log, there 
are two containerizer destroy invoked, which should not be allow. It happened 
because the first time we call the containerizer::destroy, the container state 
is changed from PROVISIONING to DESTROYING, which is fine. But in destroy, the 
containerizer has to wait for all provisioner to finish. If the await() is 
waiting the the second provision(), once the provision() finishes, it invokes 
prepare, which change the container state back to PREPARING. That is incorrect.

So the race comes from we do not check whether the container is being destroyed 
when container is being prepared by isolators.

> CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest
> -------------------------------------------------------------------------
>
>                 Key: MESOS-5238
>                 URL: https://issues.apache.org/jira/browse/MESOS-5238
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.28.0, 0.28.1
>         Environment: CentOS 7 + SSL, x86-64
>            Reporter: Neil Conway
>            Assignee: Gilbert Song
>              Labels: flaky, mesosphere
>             Fix For: 0.29.0
>
>         Attachments: 5238_check_failure.txt
>
>
> Observed on the Mesosphere internal CI:
> {noformat}
> [22:56:28]W:     [Step 10/10] F0420 22:56:28.056788   629 
> containerizer.cpp:1634] Check failed: containers_.contains(containerId)
> {noformat}
> Complete test log will be attached as a file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to