Benjamin Mahler created MESOS-1789: -------------------------------------- Summary: MasterTest.RecoveredSlaveReregisters is flaky. Key: MESOS-1789 URL: https://issues.apache.org/jira/browse/MESOS-1789 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Mahler Priority: Minor
Seen flaky on a fedora 19 VM w/ clang. {noformat} [ RUN ] MasterTest.RecoveredSlaveReregisters Using temporary directory '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru' I0910 23:37:24.522372 22914 leveldb.cpp:176] Opened db in 978us I0910 23:37:24.522948 22914 leveldb.cpp:183] Compacted db in 554320ns I0910 23:37:24.522981 22914 leveldb.cpp:198] Created db iterator in 15459ns I0910 23:37:24.523000 22914 leveldb.cpp:204] Seeked to beginning of db in 9593ns I0910 23:37:24.523020 22914 leveldb.cpp:273] Iterated through 0 keys in the db in 9137ns I0910 23:37:24.523043 22914 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0910 23:37:24.525143 22935 recover.cpp:425] Starting replica recovery I0910 23:37:24.525266 22935 recover.cpp:451] Replica is in EMPTY status I0910 23:37:24.525774 22935 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0910 23:37:24.525871 22935 recover.cpp:188] Received a recover response from a replica in EMPTY status I0910 23:37:24.526028 22935 recover.cpp:542] Updating replica status to STARTING I0910 23:37:24.526180 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 83617ns I0910 23:37:24.526211 22935 replica.cpp:320] Persisted replica status to STARTING I0910 23:37:24.526283 22935 recover.cpp:451] Replica is in STARTING status I0910 23:37:24.526725 22935 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0910 23:37:24.526813 22935 recover.cpp:188] Received a recover response from a replica in STARTING status I0910 23:37:24.526964 22935 recover.cpp:542] Updating replica status to VOTING I0910 23:37:24.527061 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 44802ns I0910 23:37:24.527091 22935 replica.cpp:320] Persisted replica status to VOTING I0910 23:37:24.527139 22935 recover.cpp:556] Successfully joined the Paxos group I0910 23:37:24.527230 22935 recover.cpp:440] Recover process terminated I0910 23:37:24.527748 22928 master.cpp:286] Master 20140910-233724-2272962752-36006-22914 (fedora-19) started on 192.168.122.135:36006 I0910 23:37:24.527807 22928 master.cpp:332] Master only allowing authenticated frameworks to register I0910 23:37:24.527827 22928 master.cpp:337] Master only allowing authenticated slaves to register I0910 23:37:24.527849 22928 credentials.hpp:36] Loading credentials for authentication from '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru/credentials' I0910 23:37:24.528890 22928 master.cpp:366] Authorization enabled I0910 23:37:24.529822 22928 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@192.168.122.135:36006 I0910 23:37:24.529903 22928 master.cpp:120] No whitelist given. Advertising offers for all slaves I0910 23:37:24.530275 22928 master.cpp:1212] The newly elected leader is master@192.168.122.135:36006 with id 20140910-233724-2272962752-36006-22914 I0910 23:37:24.530311 22928 master.cpp:1225] Elected as the leading master! I0910 23:37:24.530328 22928 master.cpp:1043] Recovering from registrar I0910 23:37:24.530426 22928 registrar.cpp:313] Recovering registrar I0910 23:37:24.530993 22928 log.cpp:656] Attempting to start the writer I0910 23:37:24.531601 22928 replica.cpp:474] Replica received implicit promise request with proposal 1 I0910 23:37:24.531677 22928 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 60319ns I0910 23:37:24.531707 22928 replica.cpp:342] Persisted promised to 1 I0910 23:37:24.532016 22928 coordinator.cpp:230] Coordinator attemping to fill missing position I0910 23:37:24.532691 22928 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0910 23:37:24.532752 22928 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 45735ns I0910 23:37:24.532783 22928 replica.cpp:676] Persisted action at 0 I0910 23:37:24.533252 22928 replica.cpp:508] Replica received write request for position 0 I0910 23:37:24.533299 22928 leveldb.cpp:438] Reading position from leveldb took 34066ns I0910 23:37:24.533354 22928 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 37637ns I0910 23:37:24.533381 22928 replica.cpp:676] Persisted action at 0 I0910 23:37:24.533701 22928 replica.cpp:655] Replica received learned notice for position 0 I0910 23:37:24.533757 22928 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 42842ns I0910 23:37:24.533785 22928 replica.cpp:676] Persisted action at 0 I0910 23:37:24.533804 22928 replica.cpp:661] Replica learned NOP action at position 0 I0910 23:37:24.534191 22928 log.cpp:672] Writer started with ending position 0 I0910 23:37:24.534695 22928 leveldb.cpp:438] Reading position from leveldb took 36602ns I0910 23:37:24.536397 22928 registrar.cpp:346] Successfully fetched the registry (0B) I0910 23:37:24.536440 22928 registrar.cpp:422] Attempting to update the 'registry' I0910 23:37:24.538182 22928 log.cpp:680] Attempting to append 131 bytes to the log I0910 23:37:24.538269 22928 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 I0910 23:37:24.538710 22928 replica.cpp:508] Replica received write request for position 1 I0910 23:37:24.538791 22928 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 63482ns I0910 23:37:24.538822 22928 replica.cpp:676] Persisted action at 1 I0910 23:37:24.539145 22928 replica.cpp:655] Replica received learned notice for position 1 I0910 23:37:24.539204 22928 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 44797ns I0910 23:37:24.539233 22928 replica.cpp:676] Persisted action at 1 I0910 23:37:24.539252 22928 replica.cpp:661] Replica learned APPEND action at position 1 I0910 23:37:24.539824 22928 registrar.cpp:479] Successfully updated 'registry' I0910 23:37:24.539904 22928 registrar.cpp:372] Successfully recovered registrar I0910 23:37:24.539988 22928 log.cpp:699] Attempting to truncate the log to 1 I0910 23:37:24.540114 22928 master.cpp:1070] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register I0910 23:37:24.540201 22928 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 I0910 23:37:24.540669 22928 replica.cpp:508] Replica received write request for position 2 I0910 23:37:24.540729 22928 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 44502ns I0910 23:37:24.540757 22928 replica.cpp:676] Persisted action at 2 I0910 23:37:24.541095 22928 replica.cpp:655] Replica received learned notice for position 2 I0910 23:37:24.541153 22928 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 43832ns I0910 23:37:24.541194 22928 leveldb.cpp:401] Deleting ~1 keys from leveldb took 29490ns I0910 23:37:24.541220 22928 replica.cpp:676] Persisted action at 2 I0910 23:37:24.541239 22928 replica.cpp:661] Replica learned TRUNCATE action at position 2 I0910 23:37:24.553033 22914 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0910 23:37:24.556795 22928 slave.cpp:169] Slave started on 59)@192.168.122.135:36006 I0910 23:37:24.556851 22928 credentials.hpp:84] Loading credential for authentication from '/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/credential' I0910 23:37:24.556998 22928 slave.cpp:276] Slave using credential for: test-principal I0910 23:37:24.557144 22928 slave.cpp:289] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0910 23:37:24.557618 22928 slave.cpp:317] Slave hostname: fedora-19 I0910 23:37:24.557644 22928 slave.cpp:318] Slave checkpoint: true I0910 23:37:24.558248 22928 state.cpp:33] Recovering state from '/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/meta' I0910 23:37:24.558413 22928 status_update_manager.cpp:193] Recovering status update manager I0910 23:37:24.558584 22928 containerizer.cpp:252] Recovering containerizer I0910 23:37:24.559092 22928 slave.cpp:3219] Finished recovery I0910 23:37:24.559608 22928 slave.cpp:600] New master detected at master@192.168.122.135:36006 I0910 23:37:24.559656 22928 slave.cpp:674] Authenticating with master master@192.168.122.135:36006 I0910 23:37:24.559743 22928 slave.cpp:647] Detecting new master I0910 23:37:24.559816 22928 status_update_manager.cpp:167] New master detected at master@192.168.122.135:36006 I0910 23:37:24.559878 22928 authenticatee.hpp:128] Creating new client SASL connection I0910 23:37:24.560583 22928 master.cpp:3653] Authenticating slave(59)@192.168.122.135:36006 I0910 23:37:24.560736 22928 authenticator.hpp:156] Creating new server SASL connection I0910 23:37:24.561318 22928 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0910 23:37:24.561362 22928 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0910 23:37:24.561420 22928 authenticator.hpp:262] Received SASL authentication start I0910 23:37:24.561486 22928 authenticator.hpp:384] Authentication requires more steps I0910 23:37:24.561590 22928 authenticatee.hpp:265] Received SASL authentication step I0910 23:37:24.561666 22928 authenticator.hpp:290] Received SASL authentication step I0910 23:37:24.561702 22928 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0910 23:37:24.561720 22928 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I0910 23:37:24.561741 22928 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0910 23:37:24.561764 22928 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0910 23:37:24.561781 22928 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0910 23:37:24.561795 22928 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0910 23:37:24.561817 22928 authenticator.hpp:376] Authentication success I0910 23:37:24.561874 22928 authenticatee.hpp:305] Authentication success I0910 23:37:24.561924 22928 master.cpp:3693] Successfully authenticated principal 'test-principal' at slave(59)@192.168.122.135:36006 I0910 23:37:24.562062 22928 slave.cpp:731] Successfully authenticated with master master@192.168.122.135:36006 Itests/master_tests.cpp:1703: Failure 0910 23:37:24.562134 22928 slave.cpp:994] Will retry registration in 12.871919ms if necessary I0910 23:37:24.562263 22935 master.cpp:2843] Registering slave at slave(59)@192.168.122.135:36006 (fedora-19) with id 20140910-233724-2272962752-36006-22914-0 I0910 23:37:25.531041 22930 hierarchical_allocator_process.hpp:697] No resources available to allocate! I0910 23:37:35.832602 22930 hierarchical_allocator_process.hpp:659] Performed allocation for 0 slaves in 10.301614677secs I0910 23:37:35.832757 22930 registrar.cpp:422] Attempting to update the 'registry' I0910 23:37:29.530727 22929 master.cpp:120] No whitelist given. Advertising offers for all slaves Failed to wait 10secs for slaveRegisteredMessage I0910 23:37:35.834206 22935 master.cpp:650] Master terminating I0910 23:37:35.834398 22935 slave.cpp:2378] master@192.168.122.135:36006 exited W0910 23:37:35.834429 22935 slave.cpp:2381] Master disconnected! Waiting for a new master to be elected I0910 23:37:35.835822 22929 log.cpp:680] Attempting to append 312 bytes to the log I0910 23:37:35.835921 22929 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3 I0910 23:37:35.836328 22929 replica.cpp:508] Replica received write request for position 3 I0910 23:37:35.836452 22929 leveldb.cpp:343] Persisting action (331 bytes) to leveldb took 95913ns I0910 23:37:35.836467 22929 replica.cpp:676] Persisted action at 3 I0910 23:37:35.836714 22929 replica.cpp:655] Replica received learned notice for position 3 I0910 23:37:35.836751 22929 leveldb.cpp:343] Persisting action (333 bytes) to leveldb took 25990ns I0910 23:37:35.836762 22929 replica.cpp:676] Persisted action at 3 I0910 23:37:35.836771 22929 replica.cpp:661] Replica learned APPEND action at position 3 I0910 23:37:35.837183 22929 log.cpp:699] Attempting to truncate the log to 3 I0910 23:37:35.837225 22929 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4 I0910 23:37:35.837359 22929 replica.cpp:508] Replica received write request for position 4 I0910 23:37:35.837393 22929 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 22808ns I0910 23:37:35.837404 22929 replica.cpp:676] Persisted action at 4 I0910 23:37:35.837700 22931 replica.cpp:655] Replica received learned notice for position 4 I0910 23:37:35.837745 22931 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 33314ns I0910 23:37:35.837765 22931 leveldb.cpp:401] Deleting ~2 keys from leveldb took 13548ns I0910 23:37:35.837775 22931 replica.cpp:676] Persisted action at 4 I0910 23:37:35.837782 22931 replica.cpp:661] Replica learned TRUNCATE action at position 4 I0910 23:37:35.839639 22914 slave.cpp:477] Slave terminating ../3rdparty/libprocess/include/process/gmock.hpp:282: Failure Actual function call count doesn't match EXPECT_CALL(filter->mock, filter(testing::A<const MessageEvent&>()))... Expected args: message matcher (8-byte object <28-3C 02-68 ED-7F 00-00>, 1, 1-byte object <60>) Expected: to be called once Actual: never called - unsatisfied and active [ FAILED ] MasterTest.RecoveredSlaveReregisters (11320 ms) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)