[ https://issues.apache.org/jira/browse/MESOS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603662#comment-16603662 ]
Chun-Hung Hsiao edited comment on MESOS-9208 at 9/4/18 10:25 PM: ----------------------------------------------------------------- An interesting observation is that in good runs, the duration between the following log lines is about 150ms: {noformat} I0904 17:07:45.835467 1316 executor.cpp:693] Forked command at 1325 I0904 17:07:45.985908 1317 executor.cpp:994] Command exited with status 0 (pid: 1325) {noformat} But in the bad run it took nearly 2 seconds: {noformat} I0904 17:13:05.861194 2022 executor.cpp:693] Forked command at 2027 I0904 17:13:07.720567 2022 executor.cpp:994] Failed to get exit status for Command (pid: 2027) {noformat} was (Author: chhsia0): And interesting observation is that in good runs, the duration between the following log lines is about 150ms: {noformat} I0904 17:07:45.835467 1316 executor.cpp:693] Forked command at 1325 I0904 17:07:45.985908 1317 executor.cpp:994] Command exited with status 0 (pid: 1325) {noformat} But in the bad run it took nearly 2 seconds: {noformat} I0904 17:13:05.861194 2022 executor.cpp:693] Forked command at 2027 I0904 17:13:07.720567 2022 executor.cpp:994] Failed to get exit status for Command (pid: 2027) {noformat} > Test `StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot` is flaky. > ----------------------------------------------------------------------------- > > Key: MESOS-9208 > URL: https://issues.apache.org/jira/browse/MESOS-9208 > Project: Mesos > Issue Type: Bug > Components: test > Affects Versions: 1.8.0 > Reporter: Chun-Hung Hsiao > Priority: Major > Labels: flaky-test, storage > Attachments: bad_run.txt > > > Test {{StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot}} is > observed to be flaky on ubuntu-16.04 with a plain build (i.e., no special > configuration): > {noformat} > ../../src/tests/storage_local_resource_provider_tests.cpp:2393 > Expected: TASK_FINISHED > To be equal to: taskFinished->state() > Which is: TASK_FAILED{noformat} > However, further investigation shows that the task was failed due to the > following error: > {noformat} > executor.cpp:994] Failed to get exit status for Command (pid: 2027){noformat} > Which indicates that the executor couldn't reap the forked child, possibly > because it had been mysteriously reaped already. > This doesn't sound like a flakiness specific to this particular test, so I'll > leave the test enabled for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)