[ 
https://issues.apache.org/jira/browse/MESOS-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler resolved MESOS-1789.
------------------------------------
    Resolution: Cannot Reproduce

Looks like the disk on this VM may have been blocked for quite some time, some 
strange interleaving of time in the log lines. Will re-open if this crops up 
again.

> MasterTest.RecoveredSlaveReregisters is flaky.
> ----------------------------------------------
>
>                 Key: MESOS-1789
>                 URL: https://issues.apache.org/jira/browse/MESOS-1789
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Mahler
>            Priority: Minor
>
> Seen flaky on a fedora 19 VM w/ clang.
> {noformat}
> [ RUN      ] MasterTest.RecoveredSlaveReregisters
> Using temporary directory '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru'
> I0910 23:37:24.522372 22914 leveldb.cpp:176] Opened db in 978us
> I0910 23:37:24.522948 22914 leveldb.cpp:183] Compacted db in 554320ns
> I0910 23:37:24.522981 22914 leveldb.cpp:198] Created db iterator in 15459ns
> I0910 23:37:24.523000 22914 leveldb.cpp:204] Seeked to beginning of db in 
> 9593ns
> I0910 23:37:24.523020 22914 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 9137ns
> I0910 23:37:24.523043 22914 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0910 23:37:24.525143 22935 recover.cpp:425] Starting replica recovery
> I0910 23:37:24.525266 22935 recover.cpp:451] Replica is in EMPTY status
> I0910 23:37:24.525774 22935 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0910 23:37:24.525871 22935 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0910 23:37:24.526028 22935 recover.cpp:542] Updating replica status to 
> STARTING
> I0910 23:37:24.526180 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 83617ns
> I0910 23:37:24.526211 22935 replica.cpp:320] Persisted replica status to 
> STARTING
> I0910 23:37:24.526283 22935 recover.cpp:451] Replica is in STARTING status
> I0910 23:37:24.526725 22935 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0910 23:37:24.526813 22935 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0910 23:37:24.526964 22935 recover.cpp:542] Updating replica status to VOTING
> I0910 23:37:24.527061 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 44802ns
> I0910 23:37:24.527091 22935 replica.cpp:320] Persisted replica status to 
> VOTING
> I0910 23:37:24.527139 22935 recover.cpp:556] Successfully joined the Paxos 
> group
> I0910 23:37:24.527230 22935 recover.cpp:440] Recover process terminated
> I0910 23:37:24.527748 22928 master.cpp:286] Master 
> 20140910-233724-2272962752-36006-22914 (fedora-19) started on 
> 192.168.122.135:36006
> I0910 23:37:24.527807 22928 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0910 23:37:24.527827 22928 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0910 23:37:24.527849 22928 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru/credentials'
> I0910 23:37:24.528890 22928 master.cpp:366] Authorization enabled
> I0910 23:37:24.529822 22928 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@192.168.122.135:36006
> I0910 23:37:24.529903 22928 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0910 23:37:24.530275 22928 master.cpp:1212] The newly elected leader is 
> master@192.168.122.135:36006 with id 20140910-233724-2272962752-36006-22914
> I0910 23:37:24.530311 22928 master.cpp:1225] Elected as the leading master!
> I0910 23:37:24.530328 22928 master.cpp:1043] Recovering from registrar
> I0910 23:37:24.530426 22928 registrar.cpp:313] Recovering registrar
> I0910 23:37:24.530993 22928 log.cpp:656] Attempting to start the writer
> I0910 23:37:24.531601 22928 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0910 23:37:24.531677 22928 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 60319ns
> I0910 23:37:24.531707 22928 replica.cpp:342] Persisted promised to 1
> I0910 23:37:24.532016 22928 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0910 23:37:24.532691 22928 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0910 23:37:24.532752 22928 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 45735ns
> I0910 23:37:24.532783 22928 replica.cpp:676] Persisted action at 0
> I0910 23:37:24.533252 22928 replica.cpp:508] Replica received write request 
> for position 0
> I0910 23:37:24.533299 22928 leveldb.cpp:438] Reading position from leveldb 
> took 34066ns
> I0910 23:37:24.533354 22928 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 37637ns
> I0910 23:37:24.533381 22928 replica.cpp:676] Persisted action at 0
> I0910 23:37:24.533701 22928 replica.cpp:655] Replica received learned notice 
> for position 0
> I0910 23:37:24.533757 22928 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 42842ns
> I0910 23:37:24.533785 22928 replica.cpp:676] Persisted action at 0
> I0910 23:37:24.533804 22928 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0910 23:37:24.534191 22928 log.cpp:672] Writer started with ending position 0
> I0910 23:37:24.534695 22928 leveldb.cpp:438] Reading position from leveldb 
> took 36602ns
> I0910 23:37:24.536397 22928 registrar.cpp:346] Successfully fetched the 
> registry (0B)
> I0910 23:37:24.536440 22928 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0910 23:37:24.538182 22928 log.cpp:680] Attempting to append 131 bytes to 
> the log
> I0910 23:37:24.538269 22928 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 1
> I0910 23:37:24.538710 22928 replica.cpp:508] Replica received write request 
> for position 1
> I0910 23:37:24.538791 22928 leveldb.cpp:343] Persisting action (150 bytes) to 
> leveldb took 63482ns
> I0910 23:37:24.538822 22928 replica.cpp:676] Persisted action at 1
> I0910 23:37:24.539145 22928 replica.cpp:655] Replica received learned notice 
> for position 1
> I0910 23:37:24.539204 22928 leveldb.cpp:343] Persisting action (152 bytes) to 
> leveldb took 44797ns
> I0910 23:37:24.539233 22928 replica.cpp:676] Persisted action at 1
> I0910 23:37:24.539252 22928 replica.cpp:661] Replica learned APPEND action at 
> position 1
> I0910 23:37:24.539824 22928 registrar.cpp:479] Successfully updated 'registry'
> I0910 23:37:24.539904 22928 registrar.cpp:372] Successfully recovered 
> registrar
> I0910 23:37:24.539988 22928 log.cpp:699] Attempting to truncate the log to 1
> I0910 23:37:24.540114 22928 master.cpp:1070] Recovered 0 slaves from the 
> Registry (95B) ; allowing 10mins for slaves to re-register
> I0910 23:37:24.540201 22928 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 2
> I0910 23:37:24.540669 22928 replica.cpp:508] Replica received write request 
> for position 2
> I0910 23:37:24.540729 22928 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 44502ns
> I0910 23:37:24.540757 22928 replica.cpp:676] Persisted action at 2
> I0910 23:37:24.541095 22928 replica.cpp:655] Replica received learned notice 
> for position 2
> I0910 23:37:24.541153 22928 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 43832ns
> I0910 23:37:24.541194 22928 leveldb.cpp:401] Deleting ~1 keys from leveldb 
> took 29490ns
> I0910 23:37:24.541220 22928 replica.cpp:676] Persisted action at 2
> I0910 23:37:24.541239 22928 replica.cpp:661] Replica learned TRUNCATE action 
> at position 2
> I0910 23:37:24.553033 22914 containerizer.cpp:89] Using isolation: 
> posix/cpu,posix/mem
> I0910 23:37:24.556795 22928 slave.cpp:169] Slave started on 
> 59)@192.168.122.135:36006
> I0910 23:37:24.556851 22928 credentials.hpp:84] Loading credential for 
> authentication from 
> '/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/credential'
> I0910 23:37:24.556998 22928 slave.cpp:276] Slave using credential for: 
> test-principal
> I0910 23:37:24.557144 22928 slave.cpp:289] Slave resources: cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0910 23:37:24.557618 22928 slave.cpp:317] Slave hostname: fedora-19
> I0910 23:37:24.557644 22928 slave.cpp:318] Slave checkpoint: true
> I0910 23:37:24.558248 22928 state.cpp:33] Recovering state from 
> '/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/meta'
> I0910 23:37:24.558413 22928 status_update_manager.cpp:193] Recovering status 
> update manager
> I0910 23:37:24.558584 22928 containerizer.cpp:252] Recovering containerizer
> I0910 23:37:24.559092 22928 slave.cpp:3219] Finished recovery
> I0910 23:37:24.559608 22928 slave.cpp:600] New master detected at 
> master@192.168.122.135:36006
> I0910 23:37:24.559656 22928 slave.cpp:674] Authenticating with master 
> master@192.168.122.135:36006
> I0910 23:37:24.559743 22928 slave.cpp:647] Detecting new master
> I0910 23:37:24.559816 22928 status_update_manager.cpp:167] New master 
> detected at master@192.168.122.135:36006
> I0910 23:37:24.559878 22928 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0910 23:37:24.560583 22928 master.cpp:3653] Authenticating 
> slave(59)@192.168.122.135:36006
> I0910 23:37:24.560736 22928 authenticator.hpp:156] Creating new server SASL 
> connection
> I0910 23:37:24.561318 22928 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0910 23:37:24.561362 22928 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0910 23:37:24.561420 22928 authenticator.hpp:262] Received SASL 
> authentication start
> I0910 23:37:24.561486 22928 authenticator.hpp:384] Authentication requires 
> more steps
> I0910 23:37:24.561590 22928 authenticatee.hpp:265] Received SASL 
> authentication step
> I0910 23:37:24.561666 22928 authenticator.hpp:290] Received SASL 
> authentication step
> I0910 23:37:24.561702 22928 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false 
> I0910 23:37:24.561720 22928 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0910 23:37:24.561741 22928 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0910 23:37:24.561764 22928 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true 
> I0910 23:37:24.561781 22928 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0910 23:37:24.561795 22928 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0910 23:37:24.561817 22928 authenticator.hpp:376] Authentication success
> I0910 23:37:24.561874 22928 authenticatee.hpp:305] Authentication success
> I0910 23:37:24.561924 22928 master.cpp:3693] Successfully authenticated 
> principal 'test-principal' at slave(59)@192.168.122.135:36006
> I0910 23:37:24.562062 22928 slave.cpp:731] Successfully authenticated with 
> master master@192.168.122.135:36006
> Itests/master_tests.cpp:1703: Failure
> 0910 23:37:24.562134 22928 slave.cpp:994] Will retry registration in 
> 12.871919ms if necessary
> I0910 23:37:24.562263 22935 master.cpp:2843] Registering slave at 
> slave(59)@192.168.122.135:36006 (fedora-19) with id 
> 20140910-233724-2272962752-36006-22914-0
> I0910 23:37:25.531041 22930 hierarchical_allocator_process.hpp:697] No 
> resources available to allocate!
> I0910 23:37:35.832602 22930 hierarchical_allocator_process.hpp:659] Performed 
> allocation for 0 slaves in 10.301614677secs
> I0910 23:37:35.832757 22930 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0910 23:37:29.530727 22929 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> Failed to wait 10secs for slaveRegisteredMessage
> I0910 23:37:35.834206 22935 master.cpp:650] Master terminating
> I0910 23:37:35.834398 22935 slave.cpp:2378] master@192.168.122.135:36006 
> exited
> W0910 23:37:35.834429 22935 slave.cpp:2381] Master disconnected! Waiting for 
> a new master to be elected
> I0910 23:37:35.835822 22929 log.cpp:680] Attempting to append 312 bytes to 
> the log
> I0910 23:37:35.835921 22929 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 3
> I0910 23:37:35.836328 22929 replica.cpp:508] Replica received write request 
> for position 3
> I0910 23:37:35.836452 22929 leveldb.cpp:343] Persisting action (331 bytes) to 
> leveldb took 95913ns
> I0910 23:37:35.836467 22929 replica.cpp:676] Persisted action at 3
> I0910 23:37:35.836714 22929 replica.cpp:655] Replica received learned notice 
> for position 3
> I0910 23:37:35.836751 22929 leveldb.cpp:343] Persisting action (333 bytes) to 
> leveldb took 25990ns
> I0910 23:37:35.836762 22929 replica.cpp:676] Persisted action at 3
> I0910 23:37:35.836771 22929 replica.cpp:661] Replica learned APPEND action at 
> position 3
> I0910 23:37:35.837183 22929 log.cpp:699] Attempting to truncate the log to 3
> I0910 23:37:35.837225 22929 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 4
> I0910 23:37:35.837359 22929 replica.cpp:508] Replica received write request 
> for position 4
> I0910 23:37:35.837393 22929 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 22808ns
> I0910 23:37:35.837404 22929 replica.cpp:676] Persisted action at 4
> I0910 23:37:35.837700 22931 replica.cpp:655] Replica received learned notice 
> for position 4
> I0910 23:37:35.837745 22931 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 33314ns
> I0910 23:37:35.837765 22931 leveldb.cpp:401] Deleting ~2 keys from leveldb 
> took 13548ns
> I0910 23:37:35.837775 22931 replica.cpp:676] Persisted action at 4
> I0910 23:37:35.837782 22931 replica.cpp:661] Replica learned TRUNCATE action 
> at position 4
> I0910 23:37:35.839639 22914 slave.cpp:477] Slave terminating
> ../3rdparty/libprocess/include/process/gmock.hpp:282: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A<const MessageEvent&>()))...
>     Expected args: message matcher (8-byte object <28-3C 02-68 ED-7F 00-00>, 
> 1, 1-byte object <60>)
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> [  FAILED  ] MasterTest.RecoveredSlaveReregisters (11320 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to