[
https://issues.apache.org/jira/browse/MESOS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603662#comment-16603662
]
Chun-Hung Hsiao edited comment on MESOS-9208 at 9/4/18 10:25 PM:
-----------------------------------------------------------------
An interesting observation is that in good runs, the duration between the
following log lines is about 150ms:
{noformat}
I0904 17:07:45.835467 1316 executor.cpp:693] Forked command at 1325
I0904 17:07:45.985908 1317 executor.cpp:994] Command exited with status 0
(pid: 1325)
{noformat}
But in the bad run it took nearly 2 seconds:
{noformat}
I0904 17:13:05.861194 2022 executor.cpp:693] Forked command at 2027
I0904 17:13:07.720567 2022 executor.cpp:994] Failed to get exit status for
Command (pid: 2027)
{noformat}
was (Author: chhsia0):
And interesting observation is that in good runs, the duration between the
following log lines is about 150ms:
{noformat}
I0904 17:07:45.835467 1316 executor.cpp:693] Forked command at 1325
I0904 17:07:45.985908 1317 executor.cpp:994] Command exited with status 0
(pid: 1325)
{noformat}
But in the bad run it took nearly 2 seconds:
{noformat}
I0904 17:13:05.861194 2022 executor.cpp:693] Forked command at 2027
I0904 17:13:07.720567 2022 executor.cpp:994] Failed to get exit status for
Command (pid: 2027)
{noformat}
> Test `StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot` is flaky.
> -----------------------------------------------------------------------------
>
> Key: MESOS-9208
> URL: https://issues.apache.org/jira/browse/MESOS-9208
> Project: Mesos
> Issue Type: Bug
> Components: test
> Affects Versions: 1.8.0
> Reporter: Chun-Hung Hsiao
> Priority: Major
> Labels: flaky-test, storage
> Attachments: bad_run.txt
>
>
> Test {{StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot}} is
> observed to be flaky on ubuntu-16.04 with a plain build (i.e., no special
> configuration):
> {noformat}
> ../../src/tests/storage_local_resource_provider_tests.cpp:2393
> Expected: TASK_FINISHED
> To be equal to: taskFinished->state()
> Which is: TASK_FAILED{noformat}
> However, further investigation shows that the task was failed due to the
> following error:
> {noformat}
> executor.cpp:994] Failed to get exit status for Command (pid: 2027){noformat}
> Which indicates that the executor couldn't reap the forked child, possibly
> because it had been mysteriously reaped already.
> This doesn't sound like a flakiness specific to this particular test, so I'll
> leave the test enabled for now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)