[
https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716992#comment-16716992
]
Benno Evers commented on MESOS-8096:
------------------------------------
Observed the same today in
`MesosContainerizer/DefaultExecutorTest.ROOT_ContainerStatusForTask/0`:
{noformat}
[ RUN ]
MesosContainerizer/DefaultExecutorTest.ROOT_ContainerStatusForTask/0
[...]
I1210 18:51:52.317384 2570 default_executor.cpp:1126] Killing task
2506c623-0270-4126-aa0c-8eeda080e50d running in child container
a1b3cf45-7361-484f-8095-4ae69dd5e777.17e50b81-46a7-4225-9c33-a0bf024618ec with
SIGTERM signal
I1210 18:51:52.317389 2570 default_executor.cpp:1137] Scheduling escalation to
SIGKILL in 3secs from now
I1210 18:51:52.317608 2570 default_executor.cpp:1126] Killing task
40e69403-db71-4902-af53-746d445a7489 running in child container
a1b3cf45-7361-484f-8095-4ae69dd5e777.c544a951-c629-492a-bc09-b1a6c72740e2 with
SIGTERM signal
I1210 18:51:52.317620 2570 default_executor.cpp:1137] Scheduling escalation to
SIGKILL in 3secs from now
I1210 18:51:52.318428 15462 process.cpp:3588] Handling HTTP event for process
'slave(1107)' with path: '/slave(1107)/api/v1'
I1210 18:51:52.318593 15461 process.cpp:3588] Handling HTTP event for process
'slave(1107)' with path: '/slave(1107)/api/v1'
*** Aborted at 1544467912 (unix time) try "date -d @1544467912" if you are
using GNU date ***
I1210 18:51:52.319488 15461 http.cpp:1157] HTTP POST for /slave(1107)/api/v1
from 172.16.10.38:60672
I1210 18:51:52.319586 15461 http.cpp:1157] HTTP POST for /slave(1107)/api/v1
from 172.16.10.38:60673
I1210 18:51:52.319697 15461 http.cpp:2797] Processing KILL_NESTED_CONTAINER
call for container
'a1b3cf45-7361-484f-8095-4ae69dd5e777.17e50b81-46a7-4225-9c33-a0bf024618ec'
I1210 18:51:52.319808 15461 http.cpp:2797] Processing KILL_NESTED_CONTAINER
call for container
'a1b3cf45-7361-484f-8095-4ae69dd5e777.c544a951-c629-492a-bc09-b1a6c72740e2'
I1210 18:51:52.319927 15461 containerizer.cpp:2839] Sending Terminated to
container
a1b3cf45-7361-484f-8095-4ae69dd5e777.17e50b81-46a7-4225-9c33-a0bf024618ec in
RUNNING state
I1210 18:51:52.320010 15460 containerizer.cpp:2839] Sending Terminated to
container
a1b3cf45-7361-484f-8095-4ae69dd5e777.c544a951-c629-492a-bc09-b1a6c72740e2 in
RUNNING state
PC: @ 0x7fd51d72d013 mesos::v1::scheduler::Mesos::send()
*** SIGSEGV (@0x0) received by PID 23718 (TID 0x7fd50f38b700) from PID 0; stack
trace: ***
@ 0x7fd4e614aabc (unknown)
@ 0x7fd4e614f751 (unknown)
@ 0x7fd4e6142f58 (unknown)
@ 0x7fd51a3ae890 (unknown)
@ 0x7fd51d72d013 mesos::v1::scheduler::Mesos::send()
@ 0x558cee3c1808
_ZNK5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE17gmock_PerformImplISC_SF_N7testing8internal12ExcessiveArgESL_SL_SL_SL_SL_SL_SL_EEvRKSt5tupleIJSC_SF_EET_T0_T1_T2_T3_T4_T5_T6_T7_T8_
@ 0x558cee3c1990
_ZN5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE7PerformERKSt5tupleIJSC_SF_EE
@ 0x558cee2c430f
_ZN7testing8internal12DoBothActionI17PromiseArgActionPILi1EPN7process7PromiseIN5mesos2v19scheduler12Event_UpdateEEEENS5_8internal5tests2v19scheduler23SendAcknowledgeActionP2INS6_11FrameworkIDENS6_7AgentIDEEEE4ImplIFvPNS7_5MesosERKS8_EE7PerformERKSt5tupleIJSN_SP_EE
@ 0x558cee2e9f57
testing::internal::FunctionMockerBase<>::UntypedPerformAction()
@ 0x558cef7b184f
testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
@ 0x558cee3d075d
mesos::internal::tests::scheduler::MockHTTPScheduler<>::events()
@ 0x558cee34cda0 std::_Function_handler<>::_M_invoke()
@ 0x7fd51d731098 process::AsyncExecutorProcess::execute<>()
@ 0x7fd51d74061b
_ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISA_SaISA_EEEEESE_SK_RSE_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSQ_FSN_T1_T2_EOT3_OT4_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteIS14_EEOSI_OSE_PNS1_11ProcessBaseEE_IS17_SI_SE_S1B_EEEDTclcl7forwardISN_Efp_Espcl7forwardIT0_Efp0_EEEOSN_DpOS1D_
@ 0x7fd51e5205d1 process::ProcessBase::consume()
@ 0x7fd51e537543 process::ProcessManager::resume()
@ 0x7fd51e53d116
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
@ 0x7fd51ab89970 (unknown)
@ 0x7fd51a3a7064 start_thread
@ 0x7fd51a0dc62d (unknown)
E1210 18:51:52.501421 2574 default_executor.cpp:801] Connection for waiting on
child container
a1b3cf45-7361-484f-8095-4ae69dd5e777.17e50b81-46a7-4225-9c33-a0bf024618ec of
task '2506c623-0270-4126-aa0c-8eeda080e50d' interrupted: Disconnected
{noformat}
> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -------------------------------------------------------------
>
> Key: MESOS-8096
> URL: https://issues.apache.org/jira/browse/MESOS-8096
> Project: Mesos
> Issue Type: Bug
> Components: scheduler driver, test
> Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
> Reporter: Alexander Rukletsov
> Assignee: Alexander Rukletsov
> Priority: Major
> Labels: flaky-test, integration, mesosphere
> Attachments: AsyncExecutorProcess-badrun-1.txt,
> AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt,
> mesos-8096-1.txt, mesos-8096-2.txt, mesos-8096-3.txt,
> scheduler-shutdown-invalid-driver-2.txt, scheduler-shutdown-invalid-driver.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached)
> hints that the problem might be in the scheduler's event queue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)