[ 
https://issues.apache.org/jira/browse/MESOS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603662#comment-16603662
 ] 

Chun-Hung Hsiao edited comment on MESOS-9208 at 9/4/18 10:31 PM:
-----------------------------------------------------------------

An interesting observation is that in good runs, the duration between the 
following log lines is about 150ms:
{noformat}
I0904 17:07:45.835467  1316 executor.cpp:693] Forked command at 1325
I0904 17:07:45.836033 29817 slave.cpp:5269] Handling status update TASK_RUNNING 
(Status UUID: d833aadb-6973-4a97-a118-61d41f36524f) for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 from executor(1)@172.16.10.215:41460
I0904 17:07:45.836513 29817 task_status_update_manager.cpp:328] Received task 
status update TASK_RUNNING (Status UUID: d833aadb-6973-4a97-a118-61d41f36524f) 
for task 5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.836566 29817 task_status_update_manager.cpp:383] Forwarding task 
status update TASK_RUNNING (Status UUID: d833aadb-6973-4a97-a118-61d41f36524f) 
for task 5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 to the agent
I0904 17:07:45.836644 29817 slave.cpp:5761] Forwarding the update TASK_RUNNING 
(Status UUID: d833aadb-6973-4a97-a118-61d41f36524f) for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 to [email protected]:34394
I0904 17:07:45.836715 29817 slave.cpp:5654] Task status update manager 
successfully handled status update TASK_RUNNING (Status UUID: 
d833aadb-6973-4a97-a118-61d41f36524f) for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.836736 29817 slave.cpp:5670] Sending acknowledgement for status 
update TASK_RUNNING (Status UUID: d833aadb-6973-4a97-a118-61d41f36524f) for 
task 5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 to executor(1)@172.16.10.215:41460
I0904 17:07:45.836864 29817 master.cpp:8332] Status update TASK_RUNNING (Status 
UUID: d833aadb-6973-4a97-a118-61d41f36524f) for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 from agent 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-S0 at slave(752)@172.16.10.215:34394 
(ip-172-16-10-215.ec2.internal)
I0904 17:07:45.836895 29817 master.cpp:8389] Forwarding status update 
TASK_RUNNING (Status UUID: d833aadb-6973-4a97-a118-61d41f36524f) for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.837005 29817 master.cpp:10879] Updating the state of task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 (latest state: TASK_RUNNING, status 
update state: TASK_RUNNING)
I0904 17:07:45.837113 29817 sched.cpp:1022] Scheduler::statusUpdate took 46577ns
I0904 17:07:45.837211 29817 master.cpp:6198] Processing ACKNOWLEDGE call for 
status d833aadb-6973-4a97-a118-61d41f36524f for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000 (default) at 
[email protected]:34394 on agent 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-S0
I0904 17:07:45.837340 29817 task_status_update_manager.cpp:401] Received task 
status update acknowledgement (UUID: d833aadb-6973-4a97-a118-61d41f36524f) for 
task 5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.837407 29817 slave.cpp:4505] Task status update manager 
successfully handled status update acknowledgement (UUID: 
d833aadb-6973-4a97-a118-61d41f36524f) for task 
5b716438-87ae-4ea6-b3b9-5d08a462cbc9 of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.838316 29813 hierarchical.cpp:2386] Filtered offer with cpus:2; 
mem:1024; disk:1024; ports:[31000-32000] on agent 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-S0 for role storage of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.838372 29813 hierarchical.cpp:1564] Performed allocation for 1 
agents in 165353ns
I0904 17:07:45.888976 29816 hierarchical.cpp:2386] Filtered offer with cpus:2; 
mem:1024; disk:1024; ports:[31000-32000] on agent 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-S0 for role storage of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.889056 29816 hierarchical.cpp:1564] Performed allocation for 1 
agents in 224502ns
I0904 17:07:45.939352 29812 hierarchical.cpp:2386] Filtered offer with cpus:2; 
mem:1024; disk:1024; ports:[31000-32000] on agent 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-S0 for role storage of framework 
5af3c866-4f98-416b-84a2-b63f7aa71dfd-0000
I0904 17:07:45.939410 29812 hierarchical.cpp:1564] Performed allocation for 1 
agents in 163889ns
I0904 17:07:45.985908  1317 executor.cpp:994] Command exited with status 0 
(pid: 1325)
{noformat}
But in the bad run it took nearly 2 seconds:
{noformat}
I0904 17:13:05.861194  2022 executor.cpp:693] Forked command at 2027
I0904 17:13:05.858180 30521 task_status_update_manager.cpp:328] Received task 
status update TASK_STARTING (Status UUID: 7ed5c225-da67-4aa8-91bb-81078750215d) 
for task 8a6c78ae-ffbb-4f17-b977-3c030f2b0353 of framework 
c106904d-8084-4d0b-9a51-2cc1f0b55945-0000
I0904 17:13:07.720315 30521 task_status_update_manager.cpp:507] Creating 
StatusUpdate stream for task 8a6c78ae-ffbb-4f17-b977-3c030f2b0353 of framework 
c106904d-8084-4d0b-9a51-2cc1f0b55945-0000
I0904 17:13:07.720567  2022 executor.cpp:994] Failed to get exit status for 
Command (pid: 2027)
{noformat}
And there was no logging activity at all in 17:13:06.


was (Author: chhsia0):
An interesting observation is that in good runs, the duration between the 
following log lines is about 150ms:
{noformat}
I0904 17:07:45.835467  1316 executor.cpp:693] Forked command at 1325
I0904 17:07:45.985908  1317 executor.cpp:994] Command exited with status 0 
(pid: 1325)
{noformat}
But in the bad run it took nearly 2 seconds:
{noformat}
I0904 17:13:05.861194  2022 executor.cpp:693] Forked command at 2027
I0904 17:13:07.720567  2022 executor.cpp:994] Failed to get exit status for 
Command (pid: 2027)
{noformat}

> Test `StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot` is flaky.
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-9208
>                 URL: https://issues.apache.org/jira/browse/MESOS-9208
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.8.0
>            Reporter: Chun-Hung Hsiao
>            Priority: Major
>              Labels: flaky-test, storage
>         Attachments: bad_run.txt
>
>
> Test {{StorageLocalResourceProviderTest.ROOT_PublishResourcesReboot}} is 
> observed to be flaky on ubuntu-16.04 with a plain build (i.e., no special 
> configuration):
> {noformat}
> ../../src/tests/storage_local_resource_provider_tests.cpp:2393
>       Expected: TASK_FINISHED
> To be equal to: taskFinished->state()
>       Which is: TASK_FAILED{noformat}
> However, further investigation shows that the task was failed due to the 
> following error:
> {noformat}
> executor.cpp:994] Failed to get exit status for Command (pid: 2027){noformat}
> Which indicates that the executor couldn't reap the forked child, possibly 
> because it had been mysteriously reaped already.
> This doesn't sound like a flakiness specific to this particular test, so I'll 
> leave the test enabled for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to