Till Toenshoff created MESOS-9216:
-------------------------------------
Summary: SchedulerTest.SchedulerFailover is flaky and times out.
Key: MESOS-9216
URL: https://issues.apache.org/jira/browse/MESOS-9216
Project: Mesos
Issue Type: Bug
Components: scheduler api, test
Environment: debian-9, centos-6, ubuntu-16.04, ..., macOS
Reporter: Till Toenshoff
Easy to reproduce for me on macOS but also observed on the ASF CI;
{noformat}
$ ./bin/mesos-tests.sh --gtest_filter="*SchedulerTest.SchedulerFailover*"
--gtest_repeat=100 --gtest_break_on_failure --verbose
{noformat}
{noformat}
[...]
Repeating all tests (iteration 61) . . .
[...]
[ RUN ] ContentType/SchedulerTest.SchedulerFailover/1
I0907 11:31:42.409766 311620992 cluster.cpp:173] Creating default 'local'
authorizer
I0907 11:31:42.411957 110624768 master.cpp:413] Master
4450e893-595f-48c2-9ea2-31325fda2c76 (lobomacpro4.fritz.box) started on
192.168.178.20:54546
I0907 11:31:42.411975 110624768 master.cpp:416] Flags at startup: --acls=""
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocator="hierarchical"
--authenticate_agents="true" --authenticate_frameworks="true"
--authenticate_http_frameworks="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs"
--authenticators="crammd5" --authorizers="local"
--credentials="/private/var/folders/66/mgr662nx7t90lspb7wjg8ctr0000gn/T/aVGDNy/credentials"
--filter_gpu_resources="true" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --http_authenticators="basic"
--http_framework_authenticators="basic" --initialize_driver_logging="true"
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--max_agent_ping_timeouts="5" --max_completed_frameworks="50"
--max_completed_tasks_per_framework="1000"
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false"
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="in_memory"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false"
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true"
--version="false" --webui_dir="/usr/local/share/mesos/webui"
--work_dir="/private/var/folders/66/mgr662nx7t90lspb7wjg8ctr0000gn/T/aVGDNy/master"
--zk_session_timeout="10secs"
I0907 11:31:42.412191 110624768 master.cpp:465] Master only allowing
authenticated frameworks to register
I0907 11:31:42.412202 110624768 master.cpp:471] Master only allowing
authenticated agents to register
I0907 11:31:42.412210 110624768 master.cpp:477] Master only allowing
authenticated HTTP frameworks to register
I0907 11:31:42.412219 110624768 credentials.hpp:37] Loading credentials for
authentication from
'/private/var/folders/66/mgr662nx7t90lspb7wjg8ctr0000gn/T/aVGDNy/credentials'
I0907 11:31:42.412322 110624768 master.cpp:521] Using default 'crammd5'
authenticator
I0907 11:31:42.412355 110624768 http.cpp:1037] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readonly'
I0907 11:31:42.412390 110624768 http.cpp:1037] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readwrite'
I0907 11:31:42.412417 110624768 http.cpp:1037] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-scheduler'
I0907 11:31:42.412439 110624768 master.cpp:602] Authorization enabled
I0907 11:31:42.413738 110624768 master.cpp:2083] Elected as the leading master!
I0907 11:31:42.413750 110624768 master.cpp:1638] Recovering from registrar
I0907 11:31:42.413913 109551616 registrar.cpp:383] Successfully fetched the
registry (0B) in 128us
I0907 11:31:42.413962 109551616 registrar.cpp:487] Applied 1 operations in
19755ns; attempting to update the registry
I0907 11:31:42.414093 109551616 registrar.cpp:544] Successfully updated the
registry in 107008ns
I0907 11:31:42.414126 109551616 registrar.cpp:416] Successfully recovered
registrar
I0907 11:31:42.414232 110624768 master.cpp:1752] Recovered 0 agents from the
registry (162B); allowing 10mins for agents to reregister
I0907 11:31:42.414614 311620992 scheduler.cpp:189] Version: 1.8.0
I0907 11:31:42.415856 113844224 scheduler.cpp:355] Using default 'basic' HTTP
authenticatee
I0907 11:31:42.415974 112771072 scheduler.cpp:538] New master detected at
[email protected]:54546
I0907 11:31:42.417650 113844224 http.cpp:1177] HTTP POST for
/master/api/v1/scheduler from 192.168.178.20:55273
I0907 11:31:42.417768 113844224 master.cpp:2502] Received subscription request
for HTTP framework 'default'
I0907 11:31:42.417788 113844224 master.cpp:2155] Authorizing framework
principal 'test-principal' to receive offers for roles '{ * }'
I0907 11:31:42.417914 113844224 master.cpp:2637] Subscribing framework
'default' with checkpointing disabled and capabilities [ MULTI_ROLE,
RESERVATION_REFINEMENT ]
I0907 11:31:42.418388 113844224 master.cpp:9883] Adding framework
4450e893-595f-48c2-9ea2-31325fda2c76-0000 (default) with roles { } suppressed
I0907 11:31:42.418522 110624768 hierarchical.cpp:306] Added framework
4450e893-595f-48c2-9ea2-31325fda2c76-0000
I0907 11:31:42.419454 311620992 scheduler.cpp:189] Version: 1.8.0
I0907 11:31:42.420704 110088192 scheduler.cpp:355] Using default 'basic' HTTP
authenticatee
I0907 11:31:42.420807 111161344 scheduler.cpp:538] New master detected at
[email protected]:54546
I0907 11:31:42.422297 113844224 http.cpp:1177] HTTP POST for
/master/api/v1/scheduler from 192.168.178.20:55275
I0907 11:31:42.422423 113844224 master.cpp:2502] Received subscription request
for HTTP framework 'default'
I0907 11:31:42.422446 113844224 master.cpp:2155] Authorizing framework
principal 'test-principal' to receive offers for roles '{ * }'
I0907 11:31:42.422591 113844224 master.cpp:2637] Subscribing framework
'default' with checkpointing disabled and capabilities [ MULTI_ROLE,
RESERVATION_REFINEMENT ]
I0907 11:31:42.422608 113844224 master.cpp:7760] Updating framework
4450e893-595f-48c2-9ea2-31325fda2c76-0000 (default) with roles { } suppressed
I0907 11:31:42.422904 111161344 master.cpp:1226] Ignoring disconnection for
framework 4450e893-595f-48c2-9ea2-31325fda2c76-0000 (default) as it has already
reconnected
I0907 11:31:42.423132 113844224 scheduler.cpp:512] Re-detecting master
I0907 11:31:42.423475 113844224 scheduler.cpp:538] New master detected at
[email protected]:54546
../../src/tests/scheduler_tests.cpp:251: Failure
Failed to wait 15secs for error
*** Aborted at 1536312717 (unix time) try "date -d @1536312717" if you are
using GNU date ***
PC: @ 0x10d891ded testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 16639 (TID 0x11292f580) stack trace: ***
@ 0x7fff72af7b3d _sigtramp
@ 0x1108a1a00 (unknown)
@ 0x10d8915e7 testing::internal::AssertHelper::operator=()
@ 0x10cf83948
mesos::internal::tests::SchedulerTest_SchedulerFailover_Test::TestBody()
@ 0x10d904c4e
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x10d8a9a9b
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x10d8a99c6 testing::Test::Run()
@ 0x10d8ab79d testing::TestInfo::Run()
@ 0x10d8acddc testing::TestCase::Run()
@ 0x10d8bd2cc testing::internal::UnitTestImpl::RunAllTests()
@ 0x10d90779e
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x10d8bcceb
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x10d8bcbac testing::UnitTest::Run()
@ 0x10c1f52f1 RUN_ALL_TESTS()
@ 0x10c1f0c9c main
@ 0x7fff7290e0a1 start
Segmentation fault: 11
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)