[ 
https://issues.apache.org/jira/browse/MESOS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603662#comment-16603662
 ] 

Chun-Hung Hsiao edited comment on MESOS-9208 at 9/4/18 10:25 PM:
-----------------------------------------------------------------

An interesting observation is that in good runs, the duration between the 
following log lines is about 150ms:
{noformat}
I0904 17:07:45.835467  1316 executor.cpp:693] Forked command at 1325
I0904 17:07:45.985908  1317 executor.cpp:994] Command exited with status 0 
(pid: 1325)
{noformat}
But in the bad run it took nearly 2 seconds:
{noformat}
I0904 17:13:05.861194  2022 executor.cpp:693] Forked command at 2027
I0904 17:13:07.720567  2022 executor.cpp:994] Failed to get exit status for 
Command (pid: 2027)
{noformat}


was (Author: chhsia0):
And interesting observation is that in good runs, the duration between the 
following log lines is about 150ms:
{noformat}
I0904 17:07:45.835467  1316 executor.cpp:693] Forked command at 1325
I0904 17:07:45.985908  1317 executor.cpp:994] Command exited with status 0 
(pid: 1325)
{noformat}
But in the bad run it took nearly 2 seconds:
{noformat}
I0904 17:13:05.861194  2022 executor.cpp:693] Forked command at 2027
I0904 17:13:07.720567  2022 executor.cpp:994] Failed to get exit status for 
Command (pid: 2027)
{noformat}

> Test `StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot` is flaky.
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-9208
>                 URL: https://issues.apache.org/jira/browse/MESOS-9208
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.8.0
>            Reporter: Chun-Hung Hsiao
>            Priority: Major
>              Labels: flaky-test, storage
>         Attachments: bad_run.txt
>
>
> Test {{StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot}} is 
> observed to be flaky on ubuntu-16.04 with a plain build (i.e., no special 
> configuration):
> {noformat}
> ../../src/tests/storage_local_resource_provider_tests.cpp:2393
>       Expected: TASK_FINISHED
> To be equal to: taskFinished->state()
>       Which is: TASK_FAILED{noformat}
> However, further investigation shows that the task was failed due to the 
> following error:
> {noformat}
> executor.cpp:994] Failed to get exit status for Command (pid: 2027){noformat}
> Which indicates that the executor couldn't reap the forked child, possibly 
> because it had been mysteriously reaped already.
> This doesn't sound like a flakiness specific to this particular test, so I'll 
> leave the test enabled for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to