[
https://issues.apache.org/jira/browse/MESOS-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254321#comment-15254321
]
Gilbert Song commented on MESOS-5238:
-------------------------------------
This bug is because of a race in mesos containerizer. From the agent log, there
are two containerizer destroy invoked, which should not be allow. It happened
because the first time we call the containerizer::destroy, the container state
is changed from PROVISIONING to DESTROYING, which is fine. But in destroy, the
containerizer has to wait for all provisioner to finish. If the await() is
waiting the the second provision(), once the provision() finishes, it invokes
prepare, which change the container state back to PREPARING. That is incorrect.
So the race comes from we do not check whether the container is being destroyed
when container is being prepared by isolators.
> CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest
> -------------------------------------------------------------------------
>
> Key: MESOS-5238
> URL: https://issues.apache.org/jira/browse/MESOS-5238
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.28.0, 0.28.1
> Environment: CentOS 7 + SSL, x86-64
> Reporter: Neil Conway
> Assignee: Gilbert Song
> Labels: flaky, mesosphere
> Fix For: 0.29.0
>
> Attachments: 5238_check_failure.txt
>
>
> Observed on the Mesosphere internal CI:
> {noformat}
> [22:56:28]W: [Step 10/10] F0420 22:56:28.056788 629
> containerizer.cpp:1634] Check failed: containers_.contains(containerId)
> {noformat}
> Complete test log will be attached as a file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)