[ 
https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767776#comment-16767776
 ] 

Meng Zhu commented on MESOS-5048:
---------------------------------

commit 1875b5380de926ce1759715227883418e2fb9717
Author: Meng Zhu <[email protected]>
Date:   Wed Jan 2 20:51:25 2019 -0800

    Fixed test `MesosContainerizerSlaveRecoveryTest.ResourceStatistics`.

    `MesosContainerizerSlaveRecoveryTest.ResourceStatistics` is flaky
    due to a race between executor shutdown (due to never getting any
    tasks) and the test querying resource statistics. If the executor
    is shutdown before the statistics query, the test will fail.

    This patch fixes the test by explicitly waiting for the task to
    be delivered and task status transition to `TASK_RUNNING` before
    restarting the agent. This way, the executor will not be shutdown
    after agent restart. Hence there will be no race.

    Review: https://reviews.apache.org/r/69656

> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
> ---------------------------------------------------------------
>
>                 Key: MESOS-5048
>                 URL: https://issues.apache.org/jira/browse/MESOS-5048
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.28.0
>         Environment: Ubuntu 15.04, Ubuntu 16.04
>            Reporter: Jian Qiu
>            Assignee: Meng Zhu
>            Priority: Major
>              Labels: flaky-test
>             Fix For: 1.8.0
>
>         Attachments: ResourceStatistics-badrun2.txt, 
> ResourceStatistics-badrun3.txt, ResourceStatistics-badrun4.txt
>
>
> ./mesos-tests.sh 
> --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics 
> --gtest_repeat=100 --gtest_break_on_failure
> This is found in rb, and reproduced in my local machine. There are two types 
> of failures. However, the failure does not appear when enabling verbose...
> {code}
> ../../src/tests/environment.cpp:790: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor 
>    \--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor 
> {code}
> And
> {code}
> I0328 15:42:36.982471  5687 exec.cpp:150] Version: 0.29.0
> I0328 15:42:37.008765  5708 exec.cpp:225] Executor registered on slave 
> 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0
> Registered executor on mesos
> ../../src/tests/slave_recovery_tests.cpp:3506: Failure
> Value of: containers.get().size()
>   Actual: 0
> Expected: 1u
> Which is: 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to