Alexander Rukletsov created MESOS-6974:
------------------------------------------

             Summary: DefaultExecutorTest.CommitSuicideOnTaskFailure test is 
flaky.
                 Key: MESOS-6974
                 URL: https://issues.apache.org/jira/browse/MESOS-6974
             Project: Mesos
          Issue Type: Bug
          Components: test
    Affects Versions: 1.1.0
         Environment: Mac OS 10.11.6 with clang-703.0.31
            Reporter: Alexander Rukletsov
            Assignee: Anand Mazumdar
            Priority: Critical


This test seems to be racy. For some reason the shutdown process in the default 
executor stalls. Sometimes the executor manages to quit (well, segfault) before 
the agent tries to resend the last task status update, but sometimes not, which 
leads to the test failure. It seems that the executor should not hang during 
termination, which may indicate a bug in the executor and not just in the test.

{noformat}
I0123 11:52:29.001549 3211264 master.cpp:5855] Status update TASK_FAILED (UUID: 
699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 from agent 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 at slave(5)@192.168.9.40:60268 (alexr)
I0123 11:52:29.001581 3211264 master.cpp:5917] Forwarding status update 
TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
I0123 11:52:29.001713 3211264 master.cpp:7956] Updating the state of task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (latest state: TASK_FAILED, status 
update state: TASK_FAILED)
I0123 11:52:29.002049 528384 hierarchical.cpp:1011] Recovered cpus(*):0.1; 
mem(*):32; disk(*):32 (total: cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000], allocated: cpus(*):0.1; mem(*):32; disk(*):32) on agent 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 from framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
I0123 11:52:29.002229 4284416 scheduler.cpp:676] Enqueuing event UPDATE 
received from http://192.168.9.40:60268/master/api/v1/scheduler
I0123 11:52:29.784299 3211264 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:29.784381 3211264 hierarchical.cpp:1279] Performed allocation for 1 
agents in 726us
I0123 11:52:29.784638 528384 master.cpp:6671] Sending 1 offers to framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (default)
I0123 11:52:29.785650 4284416 scheduler.cpp:676] Enqueuing event OFFERS 
received from http://192.168.9.40:60268/master/api/v1/scheduler
I0123 11:52:30.003669 3211264 default_executor.cpp:693] Shutting down
E0123 11:52:30.004431 4820992 process.cpp:2419] Failed to shutdown socket with 
fd 13: Socket is not connected
E0123 11:52:30.005080 4820992 process.cpp:2419] Failed to shutdown socket with 
fd 11: Socket is not connected
E0123 11:52:30.005573 4820992 process.cpp:2419] Failed to shutdown socket with 
fd 12: Socket is not connected
W0123 11:52:30.005645 2138112 process.cpp:3022] Attempted to spawn a process 
(__shutdown_executor__(1)@192.168.9.40:60313) after finalizing libprocess!
W0123 11:52:30.005695 2138112 process.cpp:3022] Attempted to spawn a process 
(__async_executor__(6)@192.168.9.40:60313) after finalizing libprocess!
E0123 11:52:30.005971 4820992 process.cpp:2419] Failed to shutdown socket with 
fd 14: Socket is not connected
I0123 11:52:30.789027 528384 hierarchical.cpp:1677] No allocations performed
I0123 11:52:30.789062 528384 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:30.789083 528384 hierarchical.cpp:1279] Performed allocation for 1 
agents in 110us
I0123 11:52:31.793439 4284416 hierarchical.cpp:1677] No allocations performed
I0123 11:52:31.793472 4284416 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:31.793485 4284416 hierarchical.cpp:1279] Performed allocation for 1 
agents in 99us
I0123 11:52:32.797495 3211264 hierarchical.cpp:1677] No allocations performed
I0123 11:52:32.797535 3211264 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:32.797554 3211264 hierarchical.cpp:1279] Performed allocation for 1 
agents in 120us
I0123 11:52:33.798820 4284416 hierarchical.cpp:1677] No allocations performed
I0123 11:52:33.798849 4284416 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:33.798862 4284416 hierarchical.cpp:1279] Performed allocation for 1 
agents in 91us
I0123 11:52:34.801596 3747840 hierarchical.cpp:1677] No allocations performed
I0123 11:52:34.801638 3747840 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:34.801659 3747840 hierarchical.cpp:1279] Performed allocation for 1 
agents in 134us
I0123 11:52:35.804436 2674688 hierarchical.cpp:1677] No allocations performed
I0123 11:52:35.804479 2674688 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:35.804500 2674688 hierarchical.cpp:1279] Performed allocation for 1 
agents in 148us
I0123 11:52:36.808641 3747840 hierarchical.cpp:1677] No allocations performed
I0123 11:52:36.808677 3747840 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:36.808696 3747840 hierarchical.cpp:1279] Performed allocation for 1 
agents in 115us
I0123 11:52:37.812849 2674688 hierarchical.cpp:1677] No allocations performed
I0123 11:52:37.812885 2674688 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:37.812904 2674688 hierarchical.cpp:1279] Performed allocation for 1 
agents in 134us
I0123 11:52:38.817015 3747840 hierarchical.cpp:1677] No allocations performed
I0123 11:52:38.817044 3747840 hierarchical.cpp:1772] No inverse offers to send 
out!
I0123 11:52:38.817059 3747840 hierarchical.cpp:1279] Performed allocation for 1 
agents in 92us
W0123 11:52:39.002764 1064960 status_update_manager.cpp:478] Resending status 
update TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
I0123 11:52:39.002830 1064960 status_update_manager.cpp:377] Forwarding update 
TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 to the agent
I0123 11:52:39.002985 2138112 slave.cpp:4196] Forwarding the update TASK_FAILED 
(UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 to [email protected]:60268
I0123 11:52:39.003178 3211264 master.cpp:5855] Status update TASK_FAILED (UUID: 
699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 from agent 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 at slave(5)@192.168.9.40:60268 (alexr)
I0123 11:52:39.003211 3211264 master.cpp:5917] Forwarding status update 
TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
I0123 11:52:39.003393 3211264 master.cpp:7956] Updating the state of task 
5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (latest state: TASK_FAILED, status 
update state: TASK_FAILED)
I0123 11:52:39.004077 1064960 scheduler.cpp:676] Enqueuing event UPDATE 
received from http://192.168.9.40:60268/master/api/v1/scheduler
../../../src/tests/default_executor_tests.cpp:930: Failure
Mock function called more times than expected - returning directly.
    Function call: update(0x7fff53295420, @0x7fbe51902530 32-byte object <D0-04 
70-15 01-00 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 50-25 90-51 
BE-7F 00-00>)
         Expected: to be called twice
           Actual: called 3 times - over-saturated and active
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to