[
https://issues.apache.org/jira/browse/MESOS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944114#comment-14944114
]
Benjamin Mahler commented on MESOS-3577:
----------------------------------------
[~anandmazumdar] thanks for filing! It's really helpful to include both good
and bad output in the description. Makes it easier to see where the failing
case went wrong. Please help me encourage others to do this when filing test
failures :)
> OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is flaky
> ----------------------------------------------------------------
>
> Key: MESOS-3577
> URL: https://issues.apache.org/jira/browse/MESOS-3577
> Project: Mesos
> Issue Type: Bug
> Reporter: Anand Mazumdar
> Labels: flaky-test
>
> Showed up on ASF CI
> https://builds.apache.org/job/Mesos/890/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull
> {code}
> [ RUN ] OversubscriptionTest.UpdateAllocatorOnSchedulerFailover
> Using temporary directory
> '/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v'
> I1003 20:29:03.367100 31549 leveldb.cpp:176] Opened db in 2.322276ms
> I1003 20:29:03.368028 31549 leveldb.cpp:183] Compacted db in 888247ns
> I1003 20:29:03.368093 31549 leveldb.cpp:198] Created db iterator in 22626ns
> I1003 20:29:03.368108 31549 leveldb.cpp:204] Seeked to beginning of db in
> 1842ns
> I1003 20:29:03.368115 31549 leveldb.cpp:273] Iterated through 0 keys in the
> db in 395ns
> I1003 20:29:03.368165 31549 replica.cpp:744] Replica recovered with log
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1003 20:29:03.368722 31575 recover.cpp:449] Starting replica recovery
> I1003 20:29:03.369118 31575 recover.cpp:475] Replica is in EMPTY status
> I1003 20:29:03.370707 31572 replica.cpp:641] Replica in EMPTY status received
> a broadcasted recover request
> I1003 20:29:03.371100 31572 master.cpp:376] Master
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e (9efc27440ed0) started on
> 172.17.5.73:38504
> I1003 20:29:03.371124 31572 master.cpp:378] Flags at startup: --acls=""
> --allocation_interval="1secs" --allocator="HierarchicalDRF"
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5"
> --authorizers="local"
> --credentials="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v/credentials"
> --framework_sorter="drf" --help="false" --hostname_lookup="true"
> --initialize_driver_logging="true" --log_auto_initialize="true"
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5"
> --quiet="false" --recovery_slave_removal_limit="100%"
> --registry="replicated_log" --registry_fetch_timeout="1mins"
> --registry_store_timeout="25secs" --registry_strict="true"
> --root_submissions="true" --slave_ping_timeout="15secs"
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false"
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui"
> --work_dir="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v/master"
> --zk_session_timeout="10secs"
> I1003 20:29:03.371477 31572 master.cpp:423] Master only allowing
> authenticated frameworks to register
> I1003 20:29:03.371496 31572 master.cpp:428] Master only allowing
> authenticated slaves to register
> I1003 20:29:03.371510 31572 credentials.hpp:37] Loading credentials for
> authentication from
> '/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_y5LK6v/credentials'
> I1003 20:29:03.371841 31572 master.cpp:467] Using default 'crammd5'
> authenticator
> I1003 20:29:03.371989 31572 master.cpp:504] Authorization enabled
> I1003 20:29:03.372009 31580 recover.cpp:195] Received a recover response from
> a replica in EMPTY status
> I1003 20:29:03.372231 31568 hierarchical.hpp:468] Initialized hierarchical
> allocator process
> I1003 20:29:03.372349 31579 whitelist_watcher.cpp:79] No whitelist given
> I1003 20:29:03.373409 31572 recover.cpp:566] Updating replica status to
> STARTING
> I1003 20:29:03.373558 31576 master.cpp:1603] The newly elected leader is
> [email protected]:38504 with id d4ff5e08-2202-4f3b-8fb2-5515adf9a97e
> I1003 20:29:03.373670 31576 master.cpp:1616] Elected as the leading master!
> I1003 20:29:03.373775 31576 master.cpp:1376] Recovering from registrar
> I1003 20:29:03.374174 31579 leveldb.cpp:306] Persisting metadata (8 bytes) to
> leveldb took 611973ns
> I1003 20:29:03.374233 31581 registrar.cpp:309] Recovering registrar
> I1003 20:29:03.374248 31579 replica.cpp:323] Persisted replica status to
> STARTING
> I1003 20:29:03.374455 31579 recover.cpp:475] Replica is in STARTING status
> I1003 20:29:03.375416 31576 replica.cpp:641] Replica in STARTING status
> received a broadcasted recover request
> I1003 20:29:03.375880 31575 recover.cpp:195] Received a recover response from
> a replica in STARTING status
> I1003 20:29:03.376230 31576 recover.cpp:566] Updating replica status to VOTING
> I1003 20:29:03.376729 31580 leveldb.cpp:306] Persisting metadata (8 bytes) to
> leveldb took 370830ns
> I1003 20:29:03.376752 31580 replica.cpp:323] Persisted replica status to
> VOTING
> I1003 20:29:03.376893 31580 recover.cpp:580] Successfully joined the Paxos
> group
> I1003 20:29:03.377115 31580 recover.cpp:464] Recover process terminated
> I1003 20:29:03.377531 31569 log.cpp:661] Attempting to start the writer
> I1003 20:29:03.378665 31583 replica.cpp:477] Replica received implicit
> promise request with proposal 1
> I1003 20:29:03.379005 31583 leveldb.cpp:306] Persisting metadata (8 bytes) to
> leveldb took 316870ns
> I1003 20:29:03.379026 31583 replica.cpp:345] Persisted promised to 1
> I1003 20:29:03.379591 31568 coordinator.cpp:231] Coordinator attemping to
> fill missing position
> I1003 20:29:03.380627 31581 replica.cpp:378] Replica received explicit
> promise request for position 0 with proposal 2
> I1003 20:29:03.380982 31581 leveldb.cpp:343] Persisting action (8 bytes) to
> leveldb took 320435ns
> I1003 20:29:03.381006 31581 replica.cpp:679] Persisted action at 0
> I1003 20:29:03.381953 31570 replica.cpp:511] Replica received write request
> for position 0
> I1003 20:29:03.382004 31570 leveldb.cpp:438] Reading position from leveldb
> took 27214ns
> I1003 20:29:03.382390 31570 leveldb.cpp:343] Persisting action (14 bytes) to
> leveldb took 346853ns
> I1003 20:29:03.382414 31570 replica.cpp:679] Persisted action at 0
> I1003 20:29:03.382910 31572 replica.cpp:658] Replica received learned notice
> for position 0
> I1003 20:29:03.383209 31572 leveldb.cpp:343] Persisting action (16 bytes) to
> leveldb took 272453ns
> I1003 20:29:03.383231 31572 replica.cpp:679] Persisted action at 0
> I1003 20:29:03.383245 31572 replica.cpp:664] Replica learned NOP action at
> position 0
> I1003 20:29:03.383776 31583 log.cpp:677] Writer started with ending position 0
> I1003 20:29:03.384682 31573 leveldb.cpp:438] Reading position from leveldb
> took 25930ns
> I1003 20:29:03.385607 31573 registrar.cpp:342] Successfully fetched the
> registry (0B) in 11.33184ms
> I1003 20:29:03.385702 31573 registrar.cpp:441] Applied 1 operations in
> 20556ns; attempting to update the 'registry'
> I1003 20:29:03.386354 31575 log.cpp:685] Attempting to append 174 bytes to
> the log
> I1003 20:29:03.386463 31570 coordinator.cpp:341] Coordinator attempting to
> write APPEND action at position 1
> I1003 20:29:03.387131 31577 replica.cpp:511] Replica received write request
> for position 1
> I1003 20:29:03.387465 31577 leveldb.cpp:343] Persisting action (193 bytes) to
> leveldb took 304517ns
> I1003 20:29:03.387487 31577 replica.cpp:679] Persisted action at 1
> I1003 20:29:03.388046 31577 replica.cpp:658] Replica received learned notice
> for position 1
> I1003 20:29:03.388494 31577 leveldb.cpp:343] Persisting action (195 bytes) to
> leveldb took 422691ns
> I1003 20:29:03.388515 31577 replica.cpp:679] Persisted action at 1
> I1003 20:29:03.388530 31577 replica.cpp:664] Replica learned APPEND action at
> position 1
> I1003 20:29:03.389806 31581 log.cpp:704] Attempting to truncate the log to 1
> I1003 20:29:03.389914 31573 registrar.cpp:486] Successfully updated the
> 'registry' in 4.150016ms
> I1003 20:29:03.389978 31583 coordinator.cpp:341] Coordinator attempting to
> write TRUNCATE action at position 2
> I1003 20:29:03.390210 31573 registrar.cpp:372] Successfully recovered
> registrar
> I1003 20:29:03.390650 31583 master.cpp:1413] Recovered 0 slaves from the
> Registry (135B) ; allowing 10mins for slaves to re-register
> I1003 20:29:03.391228 31575 replica.cpp:511] Replica received write request
> for position 2
> I1003 20:29:03.391638 31575 leveldb.cpp:343] Persisting action (16 bytes) to
> leveldb took 371414ns
> I1003 20:29:03.391661 31575 replica.cpp:679] Persisted action at 2
> I1003 20:29:03.392222 31570 replica.cpp:658] Replica received learned notice
> for position 2
> I1003 20:29:03.392685 31570 leveldb.cpp:343] Persisting action (18 bytes) to
> leveldb took 362209ns
> I1003 20:29:03.392747 31570 leveldb.cpp:401] Deleting ~1 keys from leveldb
> took 35127ns
> I1003 20:29:03.392772 31570 replica.cpp:679] Persisted action at 2
> I1003 20:29:03.392799 31570 replica.cpp:664] Replica learned TRUNCATE action
> at position 2
> I1003 20:29:03.404861 31575 slave.cpp:190] Slave started on
> 181)@172.17.5.73:38504
> I1003 20:29:03.405005 31575 slave.cpp:191] Flags at startup:
> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
> --cgroups_root="mesos" --container_disk_watch_interval="15secs"
> --containerizers="mesos"
> --credential="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_zWCYr1/credential"
> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true"
> --docker_local_archives_dir="/tmp/mesos/images/docker"
> --docker_puller="local" --docker_remove_delay="6hrs"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
> --docker_store_dir="/tmp/mesos/store/docker"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="1mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_zWCYr1/fetch"
> --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
> --gc_disk_headroom="0.1" --hadoop_home="" --help="false"
> --hostname_lookup="true" --image_provisioner_backend="copy"
> --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
> --launcher_dir="/mesos/mesos-0.26.0/_build/src" --logbufsecs="0"
> --logging_level="INFO" --oversubscribed_resources_interval="15secs"
> --perf_duration="10secs" --perf_interval="1mins"
> --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect"
> --recovery_timeout="15mins" --registration_backoff_factor="10ms"
> --resource_monitoring_interval="1secs"
> --resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]"
> --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox"
> --strict="true" --switch_user="true"
> --systemd_runtime_directory="/run/systemd/system" --version="false"
> --work_dir="/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_zWCYr1"
> I1003 20:29:03.405575 31575 credentials.hpp:85] Loading credential for
> authentication from
> '/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_zWCYr1/credential'
> I1003 20:29:03.405788 31575 slave.cpp:321] Slave using credential for:
> test-principal
> I1003 20:29:03.406411 31575 slave.cpp:354] Slave resources: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I1003 20:29:03.406519 31575 slave.cpp:390] Slave hostname: 9efc27440ed0
> I1003 20:29:03.406545 31575 slave.cpp:395] Slave checkpoint: true
> I1003 20:29:03.406986 31549 sched.cpp:164] Version: 0.26.0
> I1003 20:29:03.407536 31571 state.cpp:54] Recovering state from
> '/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_zWCYr1/meta'
> I1003 20:29:03.407918 31575 status_update_manager.cpp:202] Recovering status
> update manager
> I1003 20:29:03.408010 31568 sched.cpp:262] New master detected at
> [email protected]:38504
> I1003 20:29:03.408089 31568 sched.cpp:318] Authenticating with master
> [email protected]:38504
> I1003 20:29:03.408109 31568 sched.cpp:325] Using default CRAM-MD5
> authenticatee
> I1003 20:29:03.408393 31580 authenticatee.cpp:115] Creating new client SASL
> connection
> I1003 20:29:03.408444 31575 slave.cpp:4110] Finished recovery
> I1003 20:29:03.408839 31568 master.cpp:5138] Authenticating
> [email protected]:38504
> I1003 20:29:03.408946 31580 authenticator.cpp:407] Starting authentication
> session for crammd5_authenticatee(427)@172.17.5.73:38504
> I1003 20:29:03.408967 31575 slave.cpp:4267] Querying resource estimator for
> oversubscribable resources
> I1003 20:29:03.409216 31575 authenticator.cpp:92] Creating new server SASL
> connection
> I1003 20:29:03.409247 31568 slave.cpp:705] New master detected at
> [email protected]:38504
> I1003 20:29:03.409253 31570 status_update_manager.cpp:176] Pausing sending
> status updates
> I1003 20:29:03.409327 31568 slave.cpp:768] Authenticating with master
> [email protected]:38504
> I1003 20:29:03.409368 31568 slave.cpp:773] Using default CRAM-MD5
> authenticatee
> I1003 20:29:03.409461 31575 authenticatee.cpp:206] Received SASL
> authentication mechanisms: CRAM-MD5
> I1003 20:29:03.409498 31575 authenticatee.cpp:232] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I1003 20:29:03.409525 31568 slave.cpp:741] Detecting new master
> I1003 20:29:03.409628 31575 authenticatee.cpp:115] Creating new client SASL
> connection
> I1003 20:29:03.409632 31581 authenticator.cpp:197] Received SASL
> authentication start
> I1003 20:29:03.409901 31581 authenticator.cpp:319] Authentication requires
> more steps
> I1003 20:29:03.409970 31575 master.cpp:5138] Authenticating
> slave(181)@172.17.5.73:38504
> I1003 20:29:03.410159 31569 authenticatee.cpp:252] Received SASL
> authentication step
> I1003 20:29:03.410233 31575 authenticator.cpp:407] Starting authentication
> session for crammd5_authenticatee(428)@172.17.5.73:38504
> I1003 20:29:03.410276 31569 authenticator.cpp:225] Received SASL
> authentication step
> I1003 20:29:03.410311 31569 auxprop.cpp:102] Request to lookup properties for
> user: 'test-principal' realm: '9efc27440ed0' server FQDN: '9efc27440ed0'
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
> SASL_AUXPROP_AUTHZID: false
> I1003 20:29:03.410331 31569 auxprop.cpp:174] Looking up auxiliary property
> '*userPassword'
> I1003 20:29:03.410387 31569 auxprop.cpp:174] Looking up auxiliary property
> '*cmusaslsecretCRAM-MD5'
> I1003 20:29:03.410429 31569 auxprop.cpp:102] Request to lookup properties for
> user: 'test-principal' realm: '9efc27440ed0' server FQDN: '9efc27440ed0'
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
> SASL_AUXPROP_AUTHZID: true
> I1003 20:29:03.410454 31569 auxprop.cpp:124] Skipping auxiliary property
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1003 20:29:03.410467 31569 auxprop.cpp:124] Skipping auxiliary property
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1003 20:29:03.410490 31569 authenticator.cpp:311] Authentication success
> I1003 20:29:03.410523 31575 authenticator.cpp:92] Creating new server SASL
> connection
> I1003 20:29:03.410677 31569 authenticatee.cpp:292] Authentication success
> I1003 20:29:03.410703 31575 master.cpp:5168] Successfully authenticated
> principal 'test-principal' at
> [email protected]:38504
> I1003 20:29:03.410791 31569 authenticator.cpp:425] Authentication session
> cleanup for crammd5_authenticatee(427)@172.17.5.73:38504
> I1003 20:29:03.410836 31575 authenticatee.cpp:206] Received SASL
> authentication mechanisms: CRAM-MD5
> I1003 20:29:03.410961 31575 authenticatee.cpp:232] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I1003 20:29:03.411046 31571 sched.cpp:407] Successfully authenticated with
> master [email protected]:38504
> I1003 20:29:03.411075 31571 sched.cpp:714] Sending SUBSCRIBE call to
> [email protected]:38504
> I1003 20:29:03.411090 31582 authenticator.cpp:197] Received SASL
> authentication start
> I1003 20:29:03.411159 31582 authenticator.cpp:319] Authentication requires
> more steps
> I1003 20:29:03.411180 31571 sched.cpp:747] Will retry registration in
> 1.121725197secs if necessary
> I1003 20:29:03.411272 31582 authenticatee.cpp:252] Received SASL
> authentication step
> I1003 20:29:03.411303 31581 master.cpp:2179] Received SUBSCRIBE call for
> framework 'default' at
> [email protected]:38504
> I1003 20:29:03.411492 31581 master.cpp:1642] Authorizing framework principal
> 'test-principal' to receive offers for role '*'
> I1003 20:29:03.411509 31575 authenticator.cpp:225] Received SASL
> authentication step
> I1003 20:29:03.411540 31575 auxprop.cpp:102] Request to lookup properties for
> user: 'test-principal' realm: '9efc27440ed0' server FQDN: '9efc27440ed0'
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
> SASL_AUXPROP_AUTHZID: false
> I1003 20:29:03.411555 31575 auxprop.cpp:174] Looking up auxiliary property
> '*userPassword'
> I1003 20:29:03.411581 31575 auxprop.cpp:174] Looking up auxiliary property
> '*cmusaslsecretCRAM-MD5'
> I1003 20:29:03.411599 31575 auxprop.cpp:102] Request to lookup properties for
> user: 'test-principal' realm: '9efc27440ed0' server FQDN: '9efc27440ed0'
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
> SASL_AUXPROP_AUTHZID: true
> I1003 20:29:03.411608 31575 auxprop.cpp:124] Skipping auxiliary property
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1003 20:29:03.411619 31575 auxprop.cpp:124] Skipping auxiliary property
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1003 20:29:03.411635 31575 authenticator.cpp:311] Authentication success
> I1003 20:29:03.411770 31576 authenticatee.cpp:292] Authentication success
> I1003 20:29:03.411768 31575 master.cpp:5168] Successfully authenticated
> principal 'test-principal' at slave(181)@172.17.5.73:38504
> I1003 20:29:03.411994 31575 authenticator.cpp:425] Authentication session
> cleanup for crammd5_authenticatee(428)@172.17.5.73:38504
> I1003 20:29:03.412241 31576 slave.cpp:836] Successfully authenticated with
> master [email protected]:38504
> I1003 20:29:03.412360 31576 slave.cpp:1230] Will retry registration in
> 10.113977ms if necessary
> I1003 20:29:03.412411 31581 master.cpp:2250] Subscribing framework default
> with checkpointing disabled and capabilities [ ]
> I1003 20:29:03.412767 31576 hierarchical.hpp:515] Added framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:03.412798 31576 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:03.412837 31576 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:03.412858 31576 hierarchical.hpp:1223] Performed allocation for 0
> slaves in 70147ns
> I1003 20:29:03.413169 31571 sched.cpp:641] Framework registered with
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:03.413228 31571 sched.cpp:655] Scheduler::registered took 31641ns
> I1003 20:29:03.413275 31581 master.cpp:3862] Registering slave at
> slave(181)@172.17.5.73:38504 (9efc27440ed0) with id
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0
> I1003 20:29:03.413812 31574 registrar.cpp:441] Applied 1 operations in
> 58662ns; attempting to update the 'registry'
> I1003 20:29:03.414679 31574 log.cpp:685] Attempting to append 343 bytes to
> the log
> I1003 20:29:03.414809 31572 coordinator.cpp:341] Coordinator attempting to
> write APPEND action at position 3
> I1003 20:29:03.415621 31576 replica.cpp:511] Replica received write request
> for position 3
> I1003 20:29:03.415855 31576 leveldb.cpp:343] Persisting action (362 bytes) to
> leveldb took 196430ns
> I1003 20:29:03.415885 31576 replica.cpp:679] Persisted action at 3
> I1003 20:29:03.416429 31575 replica.cpp:658] Replica received learned notice
> for position 3
> I1003 20:29:03.416859 31575 leveldb.cpp:343] Persisting action (364 bytes) to
> leveldb took 402997ns
> I1003 20:29:03.416882 31575 replica.cpp:679] Persisted action at 3
> I1003 20:29:03.416903 31575 replica.cpp:664] Replica learned APPEND action at
> position 3
> I1003 20:29:03.418669 31577 registrar.cpp:486] Successfully updated the
> 'registry' in 4.766976ms
> I1003 20:29:03.418918 31580 log.cpp:704] Attempting to truncate the log to 3
> I1003 20:29:03.419070 31576 coordinator.cpp:341] Coordinator attempting to
> write TRUNCATE action at position 4
> I1003 20:29:03.419718 31580 slave.cpp:3138] Received ping from
> slave-observer(181)@172.17.5.73:38504
> I1003 20:29:03.419813 31575 master.cpp:3930] Registered slave
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 at slave(181)@172.17.5.73:38504
> (9efc27440ed0) with cpus(*):2; mem(*):1024; disk(*):1024;
> ports(*):[31000-32000]
> I1003 20:29:03.419993 31568 replica.cpp:511] Replica received write request
> for position 4
> I1003 20:29:03.420011 31570 hierarchical.hpp:675] Added slave
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 (9efc27440ed0) with cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: )
> I1003 20:29:03.420229 31580 slave.cpp:880] Registered with master
> [email protected]:38504; given slave ID
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0
> I1003 20:29:03.420261 31580 fetcher.cpp:77] Clearing fetcher cache
> I1003 20:29:03.420368 31568 leveldb.cpp:343] Persisting action (16 bytes) to
> leveldb took 340897ns
> I1003 20:29:03.420497 31568 replica.cpp:679] Persisted action at 4
> I1003 20:29:03.420442 31575 status_update_manager.cpp:183] Resuming sending
> status updates
> I1003 20:29:03.420678 31580 slave.cpp:903] Checkpointing SlaveInfo to
> '/tmp/OversubscriptionTest_UpdateAllocatorOnSchedulerFailover_zWCYr1/meta/slaves/d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0/slave.info'
> I1003 20:29:03.420812 31570 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:03.420948 31570 hierarchical.hpp:1241] Performed allocation for
> slave d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 in 893988ns
> I1003 20:29:03.421216 31579 master.cpp:4967] Sending 1 offers to framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> I1003 20:29:03.421609 31578 replica.cpp:658] Replica received learned notice
> for position 4
> I1003 20:29:03.421767 31573 sched.cpp:811] Scheduler::resourceOffers took
> 170881ns
> I1003 20:29:03.422123 31578 leveldb.cpp:343] Persisting action (18 bytes) to
> leveldb took 410124ns
> I1003 20:29:03.422200 31578 leveldb.cpp:401] Deleting ~2 keys from leveldb
> took 48863ns
> I1003 20:29:03.422225 31578 replica.cpp:679] Persisted action at 4
> I1003 20:29:03.422255 31578 replica.cpp:664] Replica learned TRUNCATE action
> at position 4
> I1003 20:29:03.423061 31549 sched.cpp:164] Version: 0.26.0
> I1003 20:29:03.423636 31576 sched.cpp:262] New master detected at
> [email protected]:38504
> I1003 20:29:03.423720 31576 sched.cpp:318] Authenticating with master
> [email protected]:38504
> I1003 20:29:03.423743 31576 sched.cpp:325] Using default CRAM-MD5
> authenticatee
> I1003 20:29:03.424024 31581 authenticatee.cpp:115] Creating new client SASL
> connection
> I1003 20:29:03.424335 31582 master.cpp:5138] Authenticating
> [email protected]:38504
> I1003 20:29:03.424444 31578 authenticator.cpp:407] Starting authentication
> session for crammd5_authenticatee(429)@172.17.5.73:38504
> I1003 20:29:03.424655 31568 authenticator.cpp:92] Creating new server SASL
> connection
> I1003 20:29:03.424839 31576 authenticatee.cpp:206] Received SASL
> authentication mechanisms: CRAM-MD5
> I1003 20:29:03.424865 31576 authenticatee.cpp:232] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I1003 20:29:03.424943 31576 authenticator.cpp:197] Received SASL
> authentication start
> I1003 20:29:03.424993 31576 authenticator.cpp:319] Authentication requires
> more steps
> I1003 20:29:03.425070 31576 authenticatee.cpp:252] Received SASL
> authentication step
> I1003 20:29:03.425150 31576 authenticator.cpp:225] Received SASL
> authentication step
> I1003 20:29:03.425184 31576 auxprop.cpp:102] Request to lookup properties for
> user: 'test-principal' realm: '9efc27440ed0' server FQDN: '9efc27440ed0'
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
> SASL_AUXPROP_AUTHZID: false
> I1003 20:29:03.425200 31576 auxprop.cpp:174] Looking up auxiliary property
> '*userPassword'
> I1003 20:29:03.425254 31576 auxprop.cpp:174] Looking up auxiliary property
> '*cmusaslsecretCRAM-MD5'
> I1003 20:29:03.425292 31576 auxprop.cpp:102] Request to lookup properties for
> user: 'test-principal' realm: '9efc27440ed0' server FQDN: '9efc27440ed0'
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
> SASL_AUXPROP_AUTHZID: true
> I1003 20:29:03.425310 31576 auxprop.cpp:124] Skipping auxiliary property
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1003 20:29:03.425323 31576 auxprop.cpp:124] Skipping auxiliary property
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1003 20:29:03.425344 31576 authenticator.cpp:311] Authentication success
> I1003 20:29:03.425436 31570 authenticatee.cpp:292] Authentication success
> I1003 20:29:03.425534 31574 master.cpp:5168] Successfully authenticated
> principal 'test-principal' at
> [email protected]:38504
> I1003 20:29:03.425539 31578 authenticator.cpp:425] Authentication session
> cleanup for crammd5_authenticatee(429)@172.17.5.73:38504
> I1003 20:29:03.425843 31569 sched.cpp:407] Successfully authenticated with
> master [email protected]:38504
> I1003 20:29:03.425865 31569 sched.cpp:714] Sending SUBSCRIBE call to
> [email protected]:38504
> I1003 20:29:03.425961 31569 sched.cpp:747] Will retry registration in
> 1.710335618secs if necessary
> I1003 20:29:03.426127 31576 master.cpp:2179] Received SUBSCRIBE call for
> framework 'default' at
> [email protected]:38504
> I1003 20:29:03.426195 31576 master.cpp:1642] Authorizing framework principal
> 'test-principal' to receive offers for role '*'
> I1003 20:29:03.426439 31570 master.cpp:2250] Subscribing framework default
> with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ]
> I1003 20:29:03.426496 31570 master.cpp:2314] Updating info for framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:03.426590 31570 master.cpp:2327] Framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504 failed over
> I1003 20:29:03.426753 31579 sched.cpp:1024] Got error 'Framework failed over'
> I1003 20:29:03.426776 31579 sched.cpp:1805] Asked to abort the driver
> I1003 20:29:03.426867 31579 sched.cpp:1035] Scheduler::error took 52362ns
> I1003 20:29:03.427084 31579 sched.cpp:1070] Aborting framework
> 'd4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000'
> I1003 20:29:03.427255 31578 hierarchical.hpp:1103] Recovered cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) on slave
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 from framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:03.427348 31574 sched.cpp:641] Framework registered with
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:03.427399 31574 sched.cpp:655] Scheduler::registered took 24077ns
> W1003 20:29:03.427441 31578 slave.cpp:2141] Ignoring updating pid for
> framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 because it does not exist
> W1003 20:29:03.427455 31570 master.cpp:2461] Ignoring deactivate framework
> message for framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504 because it
> is not expected from
> [email protected]:38504
> I1003 20:29:04.374158 31577 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:04.374223 31577 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 634304ns
> I1003 20:29:04.374565 31571 master.cpp:4967] Sending 1 offers to framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> I1003 20:29:04.375128 31576 sched.cpp:811] Scheduler::resourceOffers took
> 113401ns
> I1003 20:29:04.375712 31568 slave.cpp:4281] Received oversubscribable
> resources cpus(*){REV}:2 from the resource estimator
> I1003 20:29:04.375774 31568 slave.cpp:4304] Forwarding total oversubscribed
> resources cpus(*){REV}:2
> I1003 20:29:04.376061 31568 master.cpp:4272] Received update of slave
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 at slave(181)@172.17.5.73:38504
> (9efc27440ed0) with total oversubscribed resources cpus(*){REV}:2
> I1003 20:29:04.376384 31568 hierarchical.hpp:735] Slave
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 (9efc27440ed0) updated with
> oversubscribed resources cpus(*){REV}:2 (total: cpus(*):2; mem(*):1024;
> disk(*):1024; ports(*):[31000-32000]; cpus(*){REV}:2, allocated: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000])
> I1003 20:29:04.376866 31568 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:04.376895 31568 hierarchical.hpp:1241] Performed allocation for
> slave d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 in 470420ns
> I1003 20:29:04.377118 31570 master.cpp:4967] Sending 1 offers to framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> ../../src/tests/oversubscription_tests.cpp:973: Failure
> Mock function called more times than expected - returning directly.
> Function call: resourceOffers(0x7fff2c3f5b00, @0x7f36ef2a88f0 { 144-byte
> object <B0-C7 06-FD 36-7F 00-00 00-00 00-00 00-00 00-00 20-65 02-D0 36-7F
> 00-00 C0-CD 03-D0 36-7F 00-00 60-BA 03-D0 36-7F 00-00 10-CE 03-D0 36-7F 00-00
> 00-D5 02-D0 36-7F 00-00 60-B9 03-D0 36-7F 00-00 ... 00-00 00-00 00-00 00-00
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00
> 00-00 00-00 00-00 00-00 00-00 36-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00
> 1F-00 00-00> })
> Expected: to be called once
> Actual: called twice - over-saturated and active
> I1003 20:29:04.377485 31570 sched.cpp:811] Scheduler::resourceOffers took
> 146114ns
> I1003 20:29:05.375358 31580 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:05.375414 31580 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:05.375428 31580 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 542475ns
> 2015-10-03
> 20:29:06,037:31549(0x7f367ffff700):ZOO_ERROR@handle_socket_error_msg@1697:
> Socket [127.0.0.1:33701] zk retcode=-4, errno=111(Connection refused): server
> refused to accept the client
> I1003 20:29:06.377113 31572 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:06.377166 31572 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:06.377181 31572 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 565826ns
> I1003 20:29:07.378443 31580 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:07.378502 31580 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:07.378520 31580 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 567289ns
> I1003 20:29:08.379870 31575 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:08.379927 31575 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:08.379940 31575 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 572649ns
> I1003 20:29:08.408592 31578 sched.cpp:419] Ignoring authentication timeout
> because the driver is not running!
> 2015-10-03
> 20:29:09,374:31549(0x7f367ffff700):ZOO_ERROR@handle_socket_error_msg@1697:
> Socket [127.0.0.1:33701] zk retcode=-4, errno=111(Connection refused): server
> refused to accept the client
> I1003 20:29:09.381644 31578 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:09.381691 31578 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:09.381705 31578 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 540955ns
> I1003 20:29:10.383394 31580 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:10.383447 31580 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:10.383460 31580 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 500886ns
> I1003 20:29:11.384301 31578 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:11.384354 31578 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:11.384367 31578 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 524827ns
> I1003 20:29:12.386200 31577 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:12.386268 31577 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:12.386294 31577 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 805961ns
> 2015-10-03
> 20:29:12,710:31549(0x7f367ffff700):ZOO_ERROR@handle_socket_error_msg@1697:
> Socket [127.0.0.1:33701] zk retcode=-4, errno=111(Connection refused): server
> refused to accept the client
> I1003 20:29:13.387652 31574 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:13.387707 31574 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:13.387720 31574 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 519943ns
> I1003 20:29:14.389140 31571 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:14.389194 31571 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:14.389206 31571 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 512587ns
> I1003 20:29:15.390326 31574 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:15.390383 31574 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:15.390396 31574 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 527415ns
> 2015-10-03
> 20:29:16,047:31549(0x7f367ffff700):ZOO_ERROR@handle_socket_error_msg@1697:
> Socket [127.0.0.1:33701] zk retcode=-4, errno=111(Connection refused): server
> refused to accept the client
> I1003 20:29:16.391540 31575 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:16.391598 31575 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:16.391618 31575 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 556749ns
> I1003 20:29:17.392669 31578 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:17.392730 31578 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:17.392745 31578 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 534985ns
> I1003 20:29:18.394364 31571 hierarchical.hpp:1328] No resources available to
> allocate!
> I1003 20:29:18.394421 31571 hierarchical.hpp:1423] No inverse offers to send
> out!
> I1003 20:29:18.394435 31571 hierarchical.hpp:1223] Performed allocation for 1
> slaves in 560401ns
> I1003 20:29:18.420331 31580 slave.cpp:3138] Received ping from
> slave-observer(181)@172.17.5.73:38504
> I1003 20:29:19.376513 31578 slave.cpp:4267] Querying resource estimator for
> oversubscribable resources
> ../../src/tests/oversubscription_tests.cpp:910: Failure
> Mock function called more times than expected - taking default action
> specified at:
> ../../src/tests/mesos.hpp:798:
> Function call: oversubscribable()
> Returns: 16-byte object <E0-AA 45-02 00-00 00-00 80-6D 44-02 00-00
> 00-00>
> Expected: to be called once
> Actual: called twice - over-saturated and active
> ../../src/tests/oversubscription_tests.cpp:999: Failure
> Failed to wait 15secs for offers2
> I1003 20:29:19.380501 31572 master.cpp:1119] Framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504 disconnected
> I1003 20:29:19.380607 31572 master.cpp:2475] Disconnecting framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> I1003 20:29:19.380681 31572 master.cpp:2499] Deactivating framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> ../../src/tests/oversubscription_tests.cpp:995: Failure
> Actual function call count doesn't match EXPECT_CALL(sched2,
> resourceOffers(&driver2, _))...
> Expected: to be called at least once
> Actual: never called - unsatisfied and active
> W1003 20:29:19.381388 31572 master.hpp:1532] Master attempted to send message
> to disconnected framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default)
> at [email protected]:38504
> I1003 20:29:19.381559 31578 hierarchical.hpp:599] Deactivated framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> 2015-10-03
> 20:29:19,381:31549(0x7f367ffff700):ZOO_ERROR@handle_socket_error_msg@1697:
> Socket [127.0.0.1:33701] zk retcode=-4, errno=111(Connection refused): server
> refused to accept the client
> W1003 20:29:19.382132 31572 master.hpp:1532] Master attempted to send message
> to disconnected framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default)
> at [email protected]:38504
> I1003 20:29:19.382289 31572 master.cpp:1143] Giving framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504 0ns to
> failover
> I1003 20:29:19.382238 31578 hierarchical.hpp:1103] Recovered cpus(*){REV}:2
> (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000];
> cpus(*){REV}:2, allocated: cpus(*):2; mem(*):1024; disk(*):1024;
> ports(*):[31000-32000]) on slave d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 from
> framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:19.382781 31583 master.cpp:4815] Framework failover timeout,
> removing framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> I1003 20:29:19.382840 31583 master.cpp:5571] Removing framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 (default) at
> [email protected]:38504
> I1003 20:29:19.382984 31572 slave.cpp:1980] Asked to shut down framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000 by [email protected]:38504
> W1003 20:29:19.383014 31572 slave.cpp:1995] Cannot shut down unknown
> framework d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:19.383008 31578 hierarchical.hpp:1103] Recovered cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]; cpus(*){REV}:2, allocated:
> ) on slave d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0 from framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:19.383318 31573 hierarchical.hpp:552] Removed framework
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-0000
> I1003 20:29:19.383466 31569 master.cpp:919] Master terminating
> I1003 20:29:19.383764 31573 hierarchical.hpp:706] Removed slave
> d4ff5e08-2202-4f3b-8fb2-5515adf9a97e-S0
> I1003 20:29:19.384182 31572 slave.cpp:3184] [email protected]:38504 exited
> W1003 20:29:19.384207 31572 slave.cpp:3187] Master disconnected! Waiting for
> a new master to be elected
> I1003 20:29:19.388171 31579 slave.cpp:585] Slave terminating
> [ FAILED ] OversubscriptionTest.UpdateAllocatorOnSchedulerFailover (16026
> ms)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)