Vinod Kone created MESOS-1482:
---------------------------------

             Summary: SlaveRecoveryTest/0.RecoverTerminatedExecutor fails on 
registry store
                 Key: MESOS-1482
                 URL: https://issues.apache.org/jira/browse/MESOS-1482
             Project: Mesos
          Issue Type: Bug
          Components: test
            Reporter: Vinod Kone
            Assignee: Benjamin Mahler


[ RUN      ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
Using temporary directory 
'/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za'
I0612 14:22:27.358382  5908 leveldb.cpp:176] Opened db in 767016ns
I0612 14:22:27.359216  5908 leveldb.cpp:183] Compacted db in 222074ns
I0612 14:22:27.359726  5908 leveldb.cpp:198] Created db iterator in 5441ns
I0612 14:22:27.360497  5908 leveldb.cpp:204] Seeked to beginning of db in 1303ns
I0612 14:22:27.360986  5908 leveldb.cpp:273] Iterated through 0 keys in the db 
in 755ns
I0612 14:22:27.361481  5908 replica.cpp:741] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0612 14:22:27.362094  5927 recover.cpp:425] Starting replica recovery
I0612 14:22:27.362175  5927 recover.cpp:451] Replica is in EMPTY status
I0612 14:22:27.362709  5934 replica.cpp:638] Replica in EMPTY status received a 
broadcasted recover request
I0612 14:22:27.362819  5927 recover.cpp:188] Received a recover response from a 
replica in EMPTY status
I0612 14:22:27.362912  5927 recover.cpp:542] Updating replica status to STARTING
I0612 14:22:27.363024  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 41670ns
I0612 14:22:27.363039  5927 replica.cpp:320] Persisted replica status to 
STARTING
I0612 14:22:27.363092  5927 recover.cpp:451] Replica is in STARTING status
I0612 14:22:27.363404  5927 replica.cpp:638] Replica in STARTING status 
received a broadcasted recover request
I0612 14:22:27.363462  5927 recover.cpp:188] Received a recover response from a 
replica in STARTING status
I0612 14:22:27.363559  5927 recover.cpp:542] Updating replica status to VOTING
I0612 14:22:27.363622  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 16124ns
I0612 14:22:27.363634  5927 replica.cpp:320] Persisted replica status to VOTING
I0612 14:22:27.363663  5927 recover.cpp:556] Successfully joined the Paxos group
I0612 14:22:27.363708  5927 recover.cpp:440] Recover process terminated
I0612 14:22:27.368589  5930 master.cpp:280] Master 
20140612-142227-2272962752-41488-5908 (fedora-19) started on 
192.168.122.135:41488
I0612 14:22:27.368618  5930 master.cpp:317] Master only allowing authenticated 
frameworks to register
I0612 14:22:27.368626  5930 master.cpp:322] Master only allowing authenticated 
slaves to register
I0612 14:22:27.368634  5930 credentials.hpp:35] Loading credentials for 
authentication from 
'/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za/credentials'
I0612 14:22:27.368711  5930 master.cpp:348] Authorization enabled
I0612 14:22:27.369092  5930 hierarchical_allocator_process.hpp:301] 
Initializing hierarchical allocator process with master : 
[email protected]:41488
I0612 14:22:27.369130  5930 master.cpp:114] No whitelist given. Advertising 
offers for all slaves
I0612 14:22:27.369279  5930 master.cpp:961] The newly elected leader is 
[email protected]:41488 with id 20140612-142227-2272962752-41488-5908
I0612 14:22:27.369593  5930 master.cpp:974] Elected as the leading master!
I0612 14:22:27.369607  5930 master.cpp:792] Recovering from registrar
I0612 14:22:27.369669  5930 registrar.cpp:313] Recovering registrar
I0612 14:22:27.369881  5930 log.cpp:656] Attempting to start the writer
I0612 14:22:27.370656  5930 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0612 14:22:27.371258  5930 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 25467ns
I0612 14:22:27.371793  5930 replica.cpp:342] Persisted promised to 1
I0612 14:22:27.372181  5930 coordinator.cpp:230] Coordinator attemping to fill 
missing position
I0612 14:22:27.373234  5932 replica.cpp:375] Replica received explicit promise 
request for position 0 with proposal 2
I0612 14:22:27.373282  5932 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 24371ns
I0612 14:22:27.373404  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373643  5932 replica.cpp:508] Replica received write request for 
position 0
I0612 14:22:27.373673  5932 leveldb.cpp:438] Reading position from leveldb took 
13891ns
I0612 14:22:27.373700  5932 leveldb.cpp:343] Persisting action (14 bytes) to 
leveldb took 13231ns
I0612 14:22:27.373710  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373826  5932 replica.cpp:655] Replica received learned notice 
for position 0
I0612 14:22:27.373852  5932 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 13032ns
I0612 14:22:27.373862  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373869  5932 replica.cpp:661] Replica learned NOP action at 
position 0
I0612 14:22:27.374548  5930 log.cpp:672] Writer started with ending position 0
I0612 14:22:27.375668  5928 leveldb.cpp:438] Reading position from leveldb took 
12278ns
I0612 14:22:27.381489  5933 registrar.cpp:346] Successfully fetched the 
registry (0B)
I0612 14:22:27.381528  5933 registrar.cpp:422] Attempting to update the 
'registry'
I0612 14:22:27.383139  5933 log.cpp:680] Attempting to append 130 bytes to the 
log
I0612 14:22:27.383199  5933 coordinator.cpp:340] Coordinator attempting to 
write APPEND action at position 1
I0612 14:22:27.383883  5931 replica.cpp:508] Replica received write request for 
position 1
I0612 14:22:27.383942  5931 leveldb.cpp:343] Persisting action (149 bytes) to 
leveldb took 33381ns
I0612 14:22:27.383955  5931 replica.cpp:676] Persisted action at 1
I0612 14:22:27.384624  5927 replica.cpp:655] Replica received learned notice 
for position 1
I0612 14:22:27.384665  5927 leveldb.cpp:343] Persisting action (151 bytes) to 
leveldb took 18894ns
I0612 14:22:27.384675  5927 replica.cpp:676] Persisted action at 1
I0612 14:22:27.384683  5927 replica.cpp:661] Replica learned APPEND action at 
position 1
I0612 14:22:27.387426  5931 registrar.cpp:479] Successfully updated 'registry'
I0612 14:22:27.387465  5931 registrar.cpp:372] Successfully recovered registrar
I0612 14:22:27.387532  5931 master.cpp:819] Recovered 0 slaves from the 
Registry (94B) ; allowing 10mins for slaves to re-register
I0612 14:22:27.388017  5932 log.cpp:699] Attempting to truncate the log to 1
I0612 14:22:27.388064  5932 coordinator.cpp:340] Coordinator attempting to 
write TRUNCATE action at position 2
I0612 14:22:27.388219  5932 replica.cpp:508] Replica received write request for 
position 2
I0612 14:22:27.388254  5932 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 17772ns
I0612 14:22:27.388265  5932 replica.cpp:676] Persisted action at 2
I0612 14:22:27.388435  5932 replica.cpp:655] Replica received learned notice 
for position 2
I0612 14:22:27.388466  5932 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 14911ns
I0612 14:22:27.388483  5932 leveldb.cpp:401] Deleting ~1 keys from leveldb took 
5312ns
I0612 14:22:27.388490  5932 replica.cpp:676] Persisted action at 2
I0612 14:22:27.388499  5932 replica.cpp:661] Replica learned TRUNCATE action at 
position 2
I0612 14:22:27.391860  5908 mesos_containerizer.cpp:124] Using isolation: 
posix/cpu,posix/mem
I0612 14:22:27.395467  5934 slave.cpp:143] Slave started on 
36)@192.168.122.135:41488
I0612 14:22:27.395489  5934 credentials.hpp:35] Loading credentials for 
authentication from 
'/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/credential'
I0612 14:22:27.395561  5934 slave.cpp:243] Slave using credential for: 
test-principal
I0612 14:22:27.395660  5934 slave.cpp:256] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0612 14:22:27.396206  5934 slave.cpp:284] Slave hostname: fedora-19
I0612 14:22:27.396225  5934 slave.cpp:285] Slave checkpoint: true
I0612 14:22:27.396888  5934 state.cpp:33] Recovering state from 
'/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/meta'
I0612 14:22:27.396986  5934 status_update_manager.cpp:193] Recovering status 
update manager
I0612 14:22:27.397034  5934 mesos_containerizer.cpp:281] Recovering 
containerizer
I0612 14:22:27.397214  5934 slave.cpp:3024] Finished recovery
I0612 14:22:27.397889  5934 slave.cpp:537] New master detected at 
[email protected]:41488
I0612 14:22:27.398550  5934 slave.cpp:613] Authenticating with master 
[email protected]:41488
I0612 14:22:27.398824  5930 status_update_manager.cpp:167] New master detected 
at [email protected]:41488
I0612 14:22:27.399489  5932 authenticatee.hpp:128] Creating new client SASL 
connection
I0612 14:22:27.400174  5932 master.cpp:3341] Authenticating 
slave(36)@192.168.122.135:41488
I0612 14:22:27.400259  5932 authenticator.hpp:156] Creating new server SASL 
connection
I0612 14:22:27.400812  5932 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0612 14:22:27.400836  5932 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0612 14:22:27.400861  5932 authenticator.hpp:262] Received SASL authentication 
start
I0612 14:22:27.400899  5932 authenticator.hpp:384] Authentication requires more 
steps
I0612 14:22:27.400926  5932 authenticatee.hpp:265] Received SASL authentication 
step
I0612 14:22:27.400961  5932 authenticator.hpp:290] Received SASL authentication 
step
I0612 14:22:27.400979  5932 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 
I0612 14:22:27.400986  5932 auxprop.cpp:153] Looking up auxiliary property 
'*userPassword'
I0612 14:22:27.400998  5932 auxprop.cpp:153] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0612 14:22:27.401007  5932 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 
I0612 14:22:27.401015  5932 auxprop.cpp:103] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.401021  5932 auxprop.cpp:103] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.401031  5932 authenticator.hpp:376] Authentication success
I0612 14:22:27.401057  5932 authenticatee.hpp:305] Authentication success
I0612 14:22:27.401077  5932 master.cpp:3381] Successfully authenticated 
principal 'test-principal' at slave(36)@192.168.122.135:41488
I0612 14:22:27.401557  5934 slave.cpp:586] Detecting new master
I0612 14:22:27.401631  5934 slave.cpp:670] Successfully authenticated with 
master [email protected]:41488
I0612 14:22:27.401676  5934 slave.cpp:908] Will retry registration in 
8.328149ms if necessary
I0612 14:22:27.402134  5932 master.cpp:2620] Registering slave at 
slave(36)@192.168.122.135:41488 (fedora-19) with id 
20140612-142227-2272962752-41488-5908-0
I0612 14:22:27.402240  5932 registrar.cpp:422] Attempting to update the 
'registry'
I0612 14:22:27.410532  5908 sched.cpp:128] Version: 0.20.0
I0612 14:22:27.411260  5929 slave.cpp:908] Will retry registration in 8.25517ms 
if necessary
I0612 14:22:27.411367  5928 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:27.412156  5932 sched.cpp:224] New master detected at 
[email protected]:41488
I0612 14:22:27.412184  5932 sched.cpp:274] Authenticating with master 
[email protected]:41488
I0612 14:22:27.412269  5932 authenticatee.hpp:128] Creating new client SASL 
connection
I0612 14:22:27.413010  5931 master.cpp:3341] Authenticating 
scheduler(35)@192.168.122.135:41488
I0612 14:22:27.413103  5931 authenticator.hpp:156] Creating new server SASL 
connection
I0612 14:22:27.413966  5932 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0612 14:22:27.413993  5932 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0612 14:22:27.414026  5932 authenticator.hpp:262] Received SASL authentication 
start
I0612 14:22:27.414067  5932 authenticator.hpp:384] Authentication requires more 
steps
I0612 14:22:27.414105  5932 authenticatee.hpp:265] Received SASL authentication 
step
I0612 14:22:27.414170  5932 authenticator.hpp:290] Received SASL authentication 
step
I0612 14:22:27.414202  5932 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 
I0612 14:22:27.414222  5932 auxprop.cpp:153] Looking up auxiliary property 
'*userPassword'
I0612 14:22:27.414242  5932 auxprop.cpp:153] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0612 14:22:27.414263  5932 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 
I0612 14:22:27.414278  5932 auxprop.cpp:103] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.414778  5932 auxprop.cpp:103] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.414808  5932 authenticator.hpp:376] Authentication success
I0612 14:22:27.415191  5929 authenticatee.hpp:305] Authentication success
I0612 14:22:27.415879  5927 sched.cpp:348] Successfully authenticated with 
master [email protected]:41488
I0612 14:22:27.415909  5927 sched.cpp:467] Sending registration request to 
[email protected]:41488
I0612 14:22:27.416354  5930 master.cpp:3381] Successfully authenticated 
principal 'test-principal' at scheduler(35)@192.168.122.135:41488
I0612 14:22:27.416837  5930 master.cpp:1080] Received registration request from 
scheduler(35)@192.168.122.135:41488
I0612 14:22:27.416878  5930 master.cpp:1040] Authorizing framework principal 
'test-principal' to receive offers for role '*'
I0612 14:22:27.417507  5930 master.cpp:1139] Registering framework 
20140612-142227-2272962752-41488-5908-0000 at 
scheduler(35)@192.168.122.135:41488
I0612 14:22:27.417613  5930 sched.cpp:398] Framework registered with 
20140612-142227-2272962752-41488-5908-0000
I0612 14:22:27.417650  5930 sched.cpp:412] Scheduler::registered took 14831ns
I0612 14:22:27.417704  5930 hierarchical_allocator_process.hpp:331] Added 
framework 20140612-142227-2272962752-41488-5908-0000
I0612 14:22:27.417721  5930 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:27.417733  5930 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 13083ns
I0612 14:22:27.421093  5929 slave.cpp:908] Will retry registration in 
75.800166ms if necessary
I0612 14:22:27.421195  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:27.497810  5929 slave.cpp:908] Will retry registration in 
158.537849ms if necessary
I0612 14:22:27.497961  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:27.657577  5929 slave.cpp:908] Will retry registration in 
309.183804ms if necessary
I0612 14:22:27.657702  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:27.967582  5929 slave.cpp:908] Will retry registration in 
332.710217ms if necessary
I0612 14:22:27.967701  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:28.301319  5929 slave.cpp:908] Will retry registration in 
332.949975ms if necessary
I0612 14:22:28.301476  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:28.369415  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:28.369457  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 55655ns
I0612 14:22:28.634922  5929 slave.cpp:908] Will retry registration in 
1.909975292secs if necessary
I0612 14:22:28.635051  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:29.369622  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:29.369673  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 63149ns
2014-06-12 
14:22:29,391:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I0612 14:22:30.370820  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:30.370887  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 75508ns
I0612 14:22:30.545195  5929 slave.cpp:908] Will retry registration in 
4.490573386secs if necessary
I0612 14:22:30.545390  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:31.371768  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:31.371814  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 57909ns
I0612 14:22:32.370002  5929 master.cpp:114] No whitelist given. Advertising 
offers for all slaves
I0612 14:22:32.372148  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:32.372166  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 22692ns
2014-06-12 
14:22:32,727:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I0612 14:22:33.372736  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:33.372779  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 52078ns
I0612 14:22:34.373733  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:34.373778  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 54020ns
I0612 14:22:35.036500  5929 slave.cpp:908] Will retry registration in 
4.152129547secs if necessary
I0612 14:22:35.036608  5929 master.cpp:2608] Ignoring register slave message 
from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in 
progress
I0612 14:22:35.373988  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:35.374033  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 59902ns
2014-06-12 
14:22:36,063:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I0612 14:22:36.374763  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:36.374807  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 53729ns
I0612 14:22:37.370941  5929 master.cpp:114] No whitelist given. Advertising 
offers for all slaves
I0612 14:22:37.375161  5929 hierarchical_allocator_process.hpp:724] No 
resources available to allocate!
I0612 14:22:37.375181  5929 hierarchical_allocator_process.hpp:686] Performed 
allocation for 0 slaves in 23517ns
E0612 14:22:37.404731  5929 registrar.cpp:500] Registrar aborting: Failed to 
update 'registry': Failed to perform store within 10secs
F0612 14:22:37.404844  5929 master.cpp:2642] Failed to admit slave 
20140612-142227-2272962752-41488-5908-0 at slave(36)@192.168.122.135:41488 
(fedora-19): Failed to update 'registry': Failed to perform store within 10secs
*** Check failure stack trace: ***
    @     0x7fc31930fb9d  google::LogMessage::Fail()
    @     0x7fc3193118dc  google::LogMessage::SendToLog()
    @     0x7fc31930f78c  google::LogMessage::Flush()
tests/slave_recovery_tests.cpp:636: Failure
Failed to wait 10secs for offers1
    @     0x7fc3193121d9  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc318e41285  mesos::internal::master::Master::_registerSlave()
    @     0x7fc319244092  process::ProcessManager::resume()
    @     0x7fc31924438c  process::schedule()
    @       0x3284c07c53  (unknown)
    @       0x32844f5dbd  (unknown)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to