Till Toenshoff created MESOS-9215:
-------------------------------------

             Summary: SchedulerSubscribeAfterFailoverTimeout times out.
                 Key: MESOS-9215
                 URL: https://issues.apache.org/jira/browse/MESOS-9215
             Project: Mesos
          Issue Type: Bug
          Components: flaky, scheduler api, test
    Affects Versions: 1.8.0
         Environment: Ubuntu 16.04
            Reporter: Till Toenshoff


Observed this on the ASF CI - Ubuntu 16.04, all variants:
{noformat}
mesos-ec2-ubuntu-16.04-Clang.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
mesos-ec2-ubuntu-16.04-Plain.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
mesos-ec2-ubuntu-16.04-SSL_GRPC.Mesos.HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
{noformat}

{noformat}
04:04:32  [ RUN      ] 
HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout
04:04:32  I0907 04:04:32.274106 31619 cluster.cpp:173] Creating default 'local' 
authorizer
04:04:32  I0907 04:04:32.275048 31642 master.cpp:413] Master 
7979925f-4333-4207-b20f-b7ab6ea59fdd (ip-172-16-10-184.ec2.internal) started on 
172.16.10.184:35708
04:04:32  I0907 04:04:32.275074 31642 master.cpp:416] Flags at startup: 
--acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="false" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/MwIsXa/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MwIsXa/master" --zk_session_timeout="10secs"
04:04:32  I0907 04:04:32.275192 31642 master.cpp:467] Master allowing 
unauthenticated frameworks to register
04:04:32  I0907 04:04:32.275202 31642 master.cpp:471] Master only allowing 
authenticated agents to register
04:04:32  I0907 04:04:32.275208 31642 master.cpp:477] Master only allowing 
authenticated HTTP frameworks to register
04:04:32  I0907 04:04:32.275214 31642 credentials.hpp:37] Loading credentials 
for authentication from '/tmp/MwIsXa/credentials'
04:04:32  I0907 04:04:32.275274 31642 master.cpp:521] Using default 'crammd5' 
authenticator
04:04:32  I0907 04:04:32.275319 31642 http.cpp:1037] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-readonly'
04:04:32  I0907 04:04:32.275353 31642 http.cpp:1037] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-readwrite'
04:04:32  I0907 04:04:32.275379 31642 http.cpp:1037] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-scheduler'
04:04:32  I0907 04:04:32.275403 31642 master.cpp:602] Authorization enabled
04:04:32  I0907 04:04:32.275573 31643 hierarchical.cpp:182] Initialized 
hierarchical allocator process
04:04:32  I0907 04:04:32.275609 31643 whitelist_watcher.cpp:77] No whitelist 
given
04:04:32  I0907 04:04:32.276106 31646 master.cpp:2083] Elected as the leading 
master!
04:04:32  I0907 04:04:32.276127 31646 master.cpp:1638] Recovering from registrar
04:04:32  I0907 04:04:32.276168 31646 registrar.cpp:339] Recovering registrar
04:04:32  I0907 04:04:32.276291 31643 registrar.cpp:383] Successfully fetched 
the registry (0B) in 104960ns
04:04:32  I0907 04:04:32.276325 31643 registrar.cpp:487] Applied 1 operations 
in 7790ns; attempting to update the registry
04:04:32  I0907 04:04:32.276428 31646 registrar.cpp:544] Successfully updated 
the registry in 86016ns
04:04:32  I0907 04:04:32.276450 31646 registrar.cpp:416] Successfully recovered 
registrar
04:04:32  I0907 04:04:32.276528 31646 master.cpp:1752] Recovered 0 agents from 
the registry (176B); allowing 10mins for agents to reregister
04:04:32  I0907 04:04:32.276536 31641 hierarchical.cpp:220] Skipping recovery 
of hierarchical allocator: nothing to recover
04:04:32  I0907 04:04:32.276836 31619 scheduler.cpp:189] Version: 1.8.0
04:04:32  I0907 04:04:32.277160 31643 scheduler.cpp:355] Using default 'basic' 
HTTP authenticatee
04:04:32  I0907 04:04:32.277261 31643 scheduler.cpp:538] New master detected at 
[email protected]:35708
04:04:32  I0907 04:04:32.277271 31643 scheduler.cpp:547] Waiting for 0ns before 
initiating a re-(connection) attempt with the master
04:04:32  I0907 04:04:32.277776 31639 scheduler.cpp:429] Connected with the 
master at http://172.16.10.184:35708/master/api/v1/scheduler
04:04:32  I0907 04:04:32.278054 31642 scheduler.cpp:248] Sending SUBSCRIBE call 
to http://172.16.10.184:35708/master/api/v1/scheduler
04:04:32  I0907 04:04:32.278386 31643 process.cpp:3569] Handling HTTP event for 
process 'master' with path: '/master/api/v1/scheduler'
04:04:32  I0907 04:04:32.278705 31644 http.cpp:1177] HTTP POST for 
/master/api/v1/scheduler from 172.16.10.184:39922
04:04:32  I0907 04:04:32.278781 31644 master.cpp:2502] Received subscription 
request for HTTP framework 'default'
04:04:32  I0907 04:04:32.278818 31644 master.cpp:2155] Authorizing framework 
principal 'test-principal' to receive offers for roles '{ * }'
04:04:32  I0907 04:04:32.278971 31644 master.cpp:2637] Subscribing framework 
'default' with checkpointing disabled and capabilities [ MULTI_ROLE, 
RESERVATION_REFINEMENT ]
04:04:32  I0907 04:04:32.279516 31644 master.cpp:9883] Adding framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) with roles {  } suppressed
04:04:32  I0907 04:04:32.279621 31642 hierarchical.cpp:306] Added framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000
04:04:32  I0907 04:04:32.279753 31642 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 9539ns
04:04:32  I0907 04:04:32.280083 31643 scheduler.cpp:845] Enqueuing event 
SUBSCRIBED received from http://172.16.10.184:35708/master/api/v1/scheduler
04:04:32  I0907 04:04:32.280647 31639 master.cpp:1366] Framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) disconnected
04:04:32  I0907 04:04:32.280664 31639 master.cpp:3230] Deactivating framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
04:04:32  I0907 04:04:32.280678 31639 master.cpp:3207] Disconnecting framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
04:04:32  I0907 04:04:32.280686 31639 master.cpp:1381] Giving framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default) 2weeks to failover
04:04:32  I0907 04:04:32.280725 31639 hierarchical.cpp:420] Deactivated 
framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000
04:04:32  I0907 04:04:32.284456 31640 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 7296ns
04:04:32  I0907 04:04:32.284528 31640 master.cpp:1827] Skipping periodic 
registry garbage collection: no agents qualify for removal
04:04:32  I0907 04:04:32.284554 31640 master.cpp:9261] Framework failover 
timeout, removing framework 7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
04:04:32  I0907 04:04:32.284564 31640 master.cpp:10197] Removing framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000 (default)
04:04:32  I0907 04:04:32.284636 31643 hierarchical.cpp:359] Removed framework 
7979925f-4333-4207-b20f-b7ab6ea59fdd-0000
04:04:32  I0907 04:04:32.284920 31619 scheduler.cpp:189] Version: 1.8.0
04:04:32  I0907 04:04:32.285246 31642 scheduler.cpp:355] Using default 'basic' 
HTTP authenticatee
04:04:32  I0907 04:04:32.285333 31642 scheduler.cpp:538] New master detected at 
[email protected]:35708
04:04:32  I0907 04:04:32.285346 31642 scheduler.cpp:547] Waiting for 0ns before 
initiating a re-(connection) attempt with the master
04:04:32  I0907 04:04:32.285820 31643 scheduler.cpp:429] Connected with the 
master at http://172.16.10.184:35708/master/api/v1/scheduler
04:04:32  I0907 04:04:32.286113 31645 scheduler.cpp:248] Sending SUBSCRIBE call 
to http://172.16.10.184:35708/master/api/v1/scheduler
04:04:32  I0907 04:04:32.286424 31640 process.cpp:3569] Handling HTTP event for 
process 'master' with path: '/master/api/v1/scheduler'
04:04:32  I0907 04:04:32.286682 31640 http.cpp:1177] HTTP POST for 
/master/api/v1/scheduler from 172.16.10.184:39926
04:04:32  I0907 04:04:32.286759 31640 master.cpp:2502] Received subscription 
request for HTTP framework 'default'
04:04:32  I0907 04:04:32.286778 31640 master.cpp:2574] Refusing subscription of 
framework 'default': Framework has been removed
04:04:32  I0907 04:04:32.292038 31646 scheduler.cpp:512] Re-detecting master
04:04:32  I0907 04:04:32.292126 31646 scheduler.cpp:799] Ignoring event from 
old stale connection
04:04:32  I0907 04:04:32.292181 31646 scheduler.cpp:463] Ignoring disconnection 
attempt from stale connection
04:04:32  I0907 04:04:32.292212 31646 scheduler.cpp:538] New master detected at 
[email protected]:35708
04:04:32  I0907 04:04:32.292228 31646 scheduler.cpp:547] Waiting for 0ns before 
initiating a re-(connection) attempt with the master
04:04:32  I0907 04:04:32.292666 31644 scheduler.cpp:429] Connected with the 
master at http://172.16.10.184:35708/master/api/v1/scheduler
04:04:33  I0907 04:04:33.285504 31643 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11191ns
04:04:34  I0907 04:04:34.286386 31644 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11069ns
04:04:35  I0907 04:04:35.287335 31639 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 10568ns
04:04:36  I0907 04:04:36.288154 31640 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11117ns
04:04:37  I0907 04:04:37.289096 31646 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11680ns
04:04:38  I0907 04:04:38.289909 31642 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11237ns
04:04:39  I0907 04:04:39.290695 31641 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11360ns
04:04:40  I0907 04:04:40.291617 31645 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11394ns
04:04:41  I0907 04:04:41.292557 31643 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 12273ns
04:04:42  I0907 04:04:42.293593 31644 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11371ns
04:04:43  I0907 04:04:43.294623 31639 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 12066ns
04:04:44  I0907 04:04:44.295605 31640 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 12010ns
04:04:45  I0907 04:04:45.296494 31646 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11259ns
04:04:46  I0907 04:04:46.297247 31642 hierarchical.cpp:1564] Performed 
allocation for 0 agents in 11839ns
04:04:47  ../../src/tests/http_fault_tolerance_tests.cpp:200: Failure
04:04:47  Failed to wait 15secs for error
04:04:47  ../../src/tests/http_fault_tolerance_tests.cpp:182: Failure
04:04:47  Actual function call count doesn't match EXPECT_CALL(*scheduler, 
error(_, _))...
04:04:47           Expected: to be called once
04:04:47             Actual: never called - unsatisfied and active
04:04:47  I0907 04:04:47.287803 31619 master.cpp:1093] Master terminating
04:04:47  [  FAILED  ] 
HttpFaultToleranceTest.SchedulerSubscribeAfterFailoverTimeout (15016 ms)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to