[ 
https://issues.apache.org/jira/browse/MESOS-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6974:
-------------------------------
    Priority: Major  (was: Critical)

Downgrading from "Critical" to "Major" -- AFAICS this is a flaky test and 
should be fixed, but isn't more serious than other known flaky tests (of which 
there are unfortunately quite a few).

> DefaultExecutorTest.CommitSuicideOnTaskFailure test is flaky.
> -------------------------------------------------------------
>
>                 Key: MESOS-6974
>                 URL: https://issues.apache.org/jira/browse/MESOS-6974
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.1.0
>         Environment: Mac OS 10.11.6 with clang-703.0.31
>            Reporter: Alexander Rukletsov
>            Assignee: Anand Mazumdar
>              Labels: flaky-test
>         Attachments: default_executor_tests.txt
>
>
> This test seems to be racy. For some reason the shutdown process in the 
> default executor stalls. Sometimes the executor manages to quit (well, 
> segfault) before the agent tries to resend the last task status update, but 
> sometimes not, which leads to the test failure. It seems that the executor 
> should not hang during termination, which may indicate a bug in the executor 
> and not just in the test.
> {noformat}
> I0123 11:52:29.001549 3211264 master.cpp:5855] Status update TASK_FAILED 
> (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 from agent 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 at slave(5)@192.168.9.40:60268 (alexr)
> I0123 11:52:29.001581 3211264 master.cpp:5917] Forwarding status update 
> TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:29.001713 3211264 master.cpp:7956] Updating the state of task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (latest state: TASK_FAILED, status 
> update state: TASK_FAILED)
> I0123 11:52:29.002049 528384 hierarchical.cpp:1011] Recovered cpus(*):0.1; 
> mem(*):32; disk(*):32 (total: cpus(*):2; mem(*):1024; disk(*):1024; 
> ports(*):[31000-32000], allocated: cpus(*):0.1; mem(*):32; disk(*):32) on 
> agent 3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 from framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:29.002229 4284416 scheduler.cpp:676] Enqueuing event UPDATE 
> received from http://192.168.9.40:60268/master/api/v1/scheduler
> I0123 11:52:29.784299 3211264 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:29.784381 3211264 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 726us
> I0123 11:52:29.784638 528384 master.cpp:6671] Sending 1 offers to framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (default)
> I0123 11:52:29.785650 4284416 scheduler.cpp:676] Enqueuing event OFFERS 
> received from http://192.168.9.40:60268/master/api/v1/scheduler
> I0123 11:52:30.003669 3211264 default_executor.cpp:693] Shutting down
> E0123 11:52:30.004431 4820992 process.cpp:2419] Failed to shutdown socket 
> with fd 13: Socket is not connected
> E0123 11:52:30.005080 4820992 process.cpp:2419] Failed to shutdown socket 
> with fd 11: Socket is not connected
> E0123 11:52:30.005573 4820992 process.cpp:2419] Failed to shutdown socket 
> with fd 12: Socket is not connected
> W0123 11:52:30.005645 2138112 process.cpp:3022] Attempted to spawn a process 
> (__shutdown_executor__(1)@192.168.9.40:60313) after finalizing libprocess!
> W0123 11:52:30.005695 2138112 process.cpp:3022] Attempted to spawn a process 
> (__async_executor__(6)@192.168.9.40:60313) after finalizing libprocess!
> E0123 11:52:30.005971 4820992 process.cpp:2419] Failed to shutdown socket 
> with fd 14: Socket is not connected
> I0123 11:52:30.789027 528384 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:30.789062 528384 hierarchical.cpp:1772] No inverse offers to send 
> out!
> I0123 11:52:30.789083 528384 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 110us
> I0123 11:52:31.793439 4284416 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:31.793472 4284416 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:31.793485 4284416 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 99us
> I0123 11:52:32.797495 3211264 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:32.797535 3211264 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:32.797554 3211264 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 120us
> I0123 11:52:33.798820 4284416 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:33.798849 4284416 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:33.798862 4284416 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 91us
> I0123 11:52:34.801596 3747840 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:34.801638 3747840 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:34.801659 3747840 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 134us
> I0123 11:52:35.804436 2674688 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:35.804479 2674688 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:35.804500 2674688 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 148us
> I0123 11:52:36.808641 3747840 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:36.808677 3747840 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:36.808696 3747840 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 115us
> I0123 11:52:37.812849 2674688 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:37.812885 2674688 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:37.812904 2674688 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 134us
> I0123 11:52:38.817015 3747840 hierarchical.cpp:1677] No allocations performed
> I0123 11:52:38.817044 3747840 hierarchical.cpp:1772] No inverse offers to 
> send out!
> I0123 11:52:38.817059 3747840 hierarchical.cpp:1279] Performed allocation for 
> 1 agents in 92us
> W0123 11:52:39.002764 1064960 status_update_manager.cpp:478] Resending status 
> update TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:39.002830 1064960 status_update_manager.cpp:377] Forwarding 
> update TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 to the agent
> I0123 11:52:39.002985 2138112 slave.cpp:4196] Forwarding the update 
> TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 to [email protected]:60268
> I0123 11:52:39.003178 3211264 master.cpp:5855] Status update TASK_FAILED 
> (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 from agent 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-S0 at slave(5)@192.168.9.40:60268 (alexr)
> I0123 11:52:39.003211 3211264 master.cpp:5917] Forwarding status update 
> TASK_FAILED (UUID: 699ee239-4eae-4b51-a68c-80e56dfd01dd) for task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000
> I0123 11:52:39.003393 3211264 master.cpp:7956] Updating the state of task 
> 5cfe9ce6-f53b-4906-bb76-3ca6179489bc of framework 
> 3c207374-2ca5-4e9a-a138-dcb2eabb848e-0000 (latest state: TASK_FAILED, status 
> update state: TASK_FAILED)
> I0123 11:52:39.004077 1064960 scheduler.cpp:676] Enqueuing event UPDATE 
> received from http://192.168.9.40:60268/master/api/v1/scheduler
> ../../../src/tests/default_executor_tests.cpp:930: Failure
> Mock function called more times than expected - returning directly.
>     Function call: update(0x7fff53295420, @0x7fbe51902530 32-byte object 
> <D0-04 70-15 01-00 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 
> 50-25 90-51 BE-7F 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to