Yan Xu created MESOS-1428:
-----------------------------
Summary: Failed to update 'registry': Failed to perform store
within 5secs (caused flaky MasterTest.StatusUpdateAcknowledgementsThroughMaster)
Key: MESOS-1428
URL: https://issues.apache.org/jira/browse/MESOS-1428
Project: Mesos
Issue Type: Bug
Reporter: Yan Xu
{noformat}
[ RUN ] MasterTest.StatusUpdateAcknowledgementsThroughMaster
Using temporary directory
'/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_QbH4lp'
I0528 12:19:49.619946 30850 leveldb.cpp:176] Opened db in 719729ns
I0528 12:19:49.620810 30850 leveldb.cpp:183] Compacted db in 274760ns
I0528 12:19:49.621376 30850 leveldb.cpp:198] Created db iterator in 17543ns
I0528 12:19:49.621711 30850 leveldb.cpp:204] Seeked to beginning of db in
13943ns
I0528 12:19:49.622164 30850 leveldb.cpp:273] Iterated through 0 keys in the db
in 13863ns
I0528 12:19:49.622588 30850 replica.cpp:741] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0528 12:19:49.623253 30871 recover.cpp:425] Starting replica recovery
I0528 12:19:49.623350 30871 recover.cpp:451] Replica is in EMPTY status
I0528 12:19:49.623810 30868 replica.cpp:638] Replica in EMPTY status received a
broadcasted recover request
I0528 12:19:49.624260 30871 recover.cpp:188] Received a recover response from a
replica in EMPTY status
I0528 12:19:49.624357 30871 recover.cpp:542] Updating replica status to STARTING
I0528 12:19:49.624482 30871 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 50522ns
I0528 12:19:49.624498 30871 replica.cpp:320] Persisted replica status to
STARTING
I0528 12:19:49.624552 30871 recover.cpp:451] Replica is in STARTING status
I0528 12:19:49.624794 30871 replica.cpp:638] Replica in STARTING status
received a broadcasted recover request
I0528 12:19:49.624848 30871 recover.cpp:188] Received a recover response from a
replica in STARTING status
I0528 12:19:49.624943 30871 recover.cpp:542] Updating replica status to VOTING
I0528 12:19:49.625004 30871 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 22931ns
I0528 12:19:49.625016 30871 replica.cpp:320] Persisted replica status to VOTING
I0528 12:19:49.625699 30871 recover.cpp:556] Successfully joined the Paxos group
I0528 12:19:49.625758 30871 recover.cpp:440] Recover process terminated
I0528 12:19:49.630477 30869 master.cpp:272] Master
20140528-121949-2272962752-54693-30850 (fedora-19) started on
192.168.122.135:54693
I0528 12:19:49.630508 30869 master.cpp:309] Master only allowing authenticated
frameworks to register
I0528 12:19:49.630516 30869 master.cpp:314] Master only allowing authenticated
slaves to register
I0528 12:19:49.630523 30869 credentials.hpp:35] Loading credentials for
authentication from
'/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_QbH4lp/credentials'
I0528 12:19:49.630601 30869 master.cpp:340] Master enabling authorization
I0528 12:19:49.631073 30868 master.cpp:108] No whitelist given. Advertising
offers for all slaves
I0528 12:19:49.631428 30874 hierarchical_allocator_process.hpp:301]
Initializing hierarchical allocator process with master :
[email protected]:54693
I0528 12:19:49.632640 30869 master.cpp:943] The newly elected leader is
[email protected]:54693 with id 20140528-121949-2272962752-54693-30850
I0528 12:19:49.632664 30869 master.cpp:956] Elected as the leading master!
I0528 12:19:49.632673 30869 master.cpp:774] Recovering from registrar
I0528 12:19:49.632733 30869 registrar.cpp:313] Recovering registrar
I0528 12:19:49.632969 30869 log.cpp:656] Attempting to start the writer
I0528 12:19:49.633388 30869 replica.cpp:474] Replica received implicit promise
request with proposal 1
I0528 12:19:49.633424 30869 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 23276ns
I0528 12:19:49.633435 30869 replica.cpp:342] Persisted promised to 1
I0528 12:19:49.634053 30873 coordinator.cpp:230] Coordinator attemping to fill
missing position
I0528 12:19:49.634415 30873 replica.cpp:375] Replica received explicit promise
request for position 0 with proposal 2
I0528 12:19:49.634454 30873 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 25866ns
I0528 12:19:49.634464 30873 replica.cpp:676] Persisted action at 0
I0528 12:19:49.634755 30873 replica.cpp:508] Replica received write request for
position 0
I0528 12:19:49.634791 30873 leveldb.cpp:438] Reading position from leveldb took
25880ns
I0528 12:19:49.634819 30873 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 18061ns
I0528 12:19:49.634829 30873 replica.cpp:676] Persisted action at 0
I0528 12:19:49.634948 30873 replica.cpp:655] Replica received learned notice
for position 0
I0528 12:19:49.634976 30873 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 19561ns
I0528 12:19:49.634987 30873 replica.cpp:676] Persisted action at 0
I0528 12:19:49.634994 30873 replica.cpp:661] Replica learned NOP action at
position 0
I0528 12:19:49.635668 30869 log.cpp:672] Writer started with ending position 0
I0528 12:19:49.636436 30869 leveldb.cpp:438] Reading position from leveldb took
22086ns
I0528 12:19:49.640519 30868 registrar.cpp:346] Successfully fetched the
registry (0B)
I0528 12:19:49.640575 30868 registrar.cpp:422] Attempting to update the
'registry'
I0528 12:19:49.641543 30868 log.cpp:680] Attempting to append 131 bytes to the
log
I0528 12:19:49.641621 30868 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 1
I0528 12:19:49.642545 30869 replica.cpp:508] Replica received write request for
position 1
I0528 12:19:49.642987 30869 leveldb.cpp:343] Persisting action (150 bytes) to
leveldb took 57416ns
I0528 12:19:49.643421 30869 replica.cpp:676] Persisted action at 1
I0528 12:19:49.644212 30871 replica.cpp:655] Replica received learned notice
for position 1
I0528 12:19:49.644254 30871 leveldb.cpp:343] Persisting action (152 bytes) to
leveldb took 25541ns
I0528 12:19:49.644266 30871 replica.cpp:676] Persisted action at 1
I0528 12:19:49.644273 30871 replica.cpp:661] Replica learned APPEND action at
position 1
I0528 12:19:49.648641 30872 registrar.cpp:479] Successfully updated 'registry'
I0528 12:19:49.648681 30872 registrar.cpp:372] Successfully recovered registrar
I0528 12:19:49.648738 30872 master.cpp:801] Recovered 0 slaves from the
Registry (95B) ; allowing 10mins for slaves to re-register
I0528 12:19:49.650076 30874 log.cpp:699] Attempting to truncate the log to 1
I0528 12:19:49.650135 30874 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 2
I0528 12:19:49.650290 30874 replica.cpp:508] Replica received write request for
position 2
I0528 12:19:49.650326 30874 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 24238ns
I0528 12:19:49.650337 30874 replica.cpp:676] Persisted action at 2
I0528 12:19:49.650475 30874 replica.cpp:655] Replica received learned notice
for position 2
I0528 12:19:49.650504 30874 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 21030ns
I0528 12:19:49.650521 30874 leveldb.cpp:401] Deleting ~1 keys from leveldb took
10152ns
I0528 12:19:49.650531 30874 replica.cpp:676] Persisted action at 2
I0528 12:19:49.650537 30874 replica.cpp:661] Replica learned TRUNCATE action at
position 2
I0528 12:19:49.655011 30875 slave.cpp:143] Slave started on
41)@192.168.122.135:54693
I0528 12:19:49.655096 30875 credentials.hpp:35] Loading credentials for
authentication from
'/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_orwcE4/credential'
I0528 12:19:49.655164 30875 slave.cpp:242] Slave using credential for:
test-principal
I0528 12:19:49.655258 30875 slave.cpp:255] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0528 12:19:49.655769 30875 slave.cpp:283] Slave hostname: fedora-19
I0528 12:19:49.655787 30875 slave.cpp:284] Slave checkpoint: false
I0528 12:19:49.656296 30868 state.cpp:33] Recovering state from
'/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_orwcE4/meta'
I0528 12:19:49.656734 30875 status_update_manager.cpp:193] Recovering status
update manager
I0528 12:19:49.656857 30875 slave.cpp:3019] Finished recovery
I0528 12:19:49.657474 30875 slave.cpp:536] New master detected at
[email protected]:54693
I0528 12:19:49.658026 30870 status_update_manager.cpp:167] New master detected
at [email protected]:54693
I0528 12:19:49.658486 30875 slave.cpp:612] Authenticating with master
[email protected]:54693
I0528 12:19:49.658818 30875 slave.cpp:585] Detecting new master
I0528 12:19:49.658874 30875 authenticatee.hpp:128] Creating new client SASL
connection
I0528 12:19:49.659700 30875 master.cpp:2981] Authenticating
slave(41)@192.168.122.135:54693
I0528 12:19:49.660415 30872 authenticator.hpp:156] Creating new server SASL
connection
I0528 12:19:49.660895 30872 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0528 12:19:49.660919 30872 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0528 12:19:49.660944 30872 authenticator.hpp:262] Received SASL authentication
start
I0528 12:19:49.660996 30872 authenticator.hpp:384] Authentication requires more
steps
I0528 12:19:49.661142 30872 authenticatee.hpp:265] Received SASL authentication
step
I0528 12:19:49.661201 30872 authenticator.hpp:290] Received SASL authentication
step
I0528 12:19:49.661221 30872 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0528 12:19:49.661228 30872 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0528 12:19:49.661238 30872 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0528 12:19:49.661258 30872 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0528 12:19:49.661267 30872 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0528 12:19:49.661273 30872 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0528 12:19:49.661284 30872 authenticator.hpp:376] Authentication success
I0528 12:19:49.661308 30872 authenticatee.hpp:305] Authentication success
I0528 12:19:49.661351 30872 slave.cpp:669] Successfully authenticated with
master [email protected]:54693
I0528 12:19:49.661396 30872 slave.cpp:903] Will retry registration in
18.309152ms if necessary
I0528 12:19:49.661963 30875 master.cpp:2239] Queuing up registration request
from slave(41)@192.168.122.135:54693 because authentication is still in progress
I0528 12:19:49.662287 30875 master.cpp:3021] Successfully authenticated
principal 'test-principal' at slave(41)@192.168.122.135:54693
I0528 12:19:49.665110 30875 master.cpp:2297] Registering slave at
slave(41)@192.168.122.135:54693 (fedora-19) with id
20140528-121949-2272962752-54693-30850-0
I0528 12:19:49.666139 30870 registrar.cpp:422] Attempting to update the
'registry'
I0528 12:19:49.674161 30850 sched.cpp:126] Version: 0.19.0
I0528 12:19:49.675194 30873 sched.cpp:222] New master detected at
[email protected]:54693
I0528 12:19:49.675230 30873 sched.cpp:273] Authenticating with master
[email protected]:54693
I0528 12:19:49.675297 30873 authenticatee.hpp:128] Creating new client SASL
connection
I0528 12:19:49.675968 30873 master.cpp:2981] Authenticating
scheduler(35)@192.168.122.135:54693
I0528 12:19:49.676316 30873 authenticator.hpp:156] Creating new server SASL
connection
I0528 12:19:49.676792 30873 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0528 12:19:49.676820 30873 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0528 12:19:49.676847 30873 authenticator.hpp:262] Received SASL authentication
start
I0528 12:19:49.676878 30873 authenticator.hpp:384] Authentication requires more
steps
I0528 12:19:49.676903 30873 authenticatee.hpp:265] Received SASL authentication
step
I0528 12:19:49.676934 30873 authenticator.hpp:290] Received SASL authentication
step
I0528 12:19:49.676949 30873 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0528 12:19:49.676956 30873 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0528 12:19:49.676965 30873 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0528 12:19:49.676975 30873 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0528 12:19:49.676981 30873 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0528 12:19:49.676986 30873 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0528 12:19:49.676997 30873 authenticator.hpp:376] Authentication success
I0528 12:19:49.677021 30873 authenticatee.hpp:305] Authentication success
I0528 12:19:49.677472 30868 sched.cpp:347] Successfully authenticated with
master [email protected]:54693
I0528 12:19:49.677491 30868 sched.cpp:466] Sending registration request to
[email protected]:54693
I0528 12:19:49.677085 30871 master.cpp:3021] Successfully authenticated
principal 'test-principal' at scheduler(35)@192.168.122.135:54693
I0528 12:19:49.677599 30871 master.cpp:1027] Received registration request from
scheduler(35)@192.168.122.135:54693
I0528 12:19:49.677635 30871 master.cpp:1045] Registering framework
20140528-121949-2272962752-54693-30850-0000 at
scheduler(35)@192.168.122.135:54693
I0528 12:19:49.677695 30871 sched.cpp:397] Framework registered with
20140528-121949-2272962752-54693-30850-0000
I0528 12:19:49.677714 30871 sched.cpp:411] Scheduler::registered took 12474ns
I0528 12:19:49.677744 30871 hierarchical_allocator_process.hpp:331] Added
framework 20140528-121949-2272962752-54693-30850-0000
I0528 12:19:49.677754 30871 hierarchical_allocator_process.hpp:725] No
resources available to allocate!
I0528 12:19:49.677760 30871 hierarchical_allocator_process.hpp:687] Performed
allocation for 0 slaves in 10063ns
I0528 12:19:49.680667 30873 slave.cpp:903] Will retry registration in
24.147524ms if necessary
I0528 12:19:49.680727 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:49.705091 30873 slave.cpp:903] Will retry registration in
13.305655ms if necessary
I0528 12:19:49.705150 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:49.719496 30873 slave.cpp:903] Will retry registration in
101.534376ms if necessary
I0528 12:19:49.719555 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:49.821956 30873 slave.cpp:903] Will retry registration in
51.019643ms if necessary
I0528 12:19:49.822016 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:49.873402 30873 slave.cpp:903] Will retry registration in
623.788265ms if necessary
I0528 12:19:49.873461 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:50.498261 30873 slave.cpp:903] Will retry registration in
1.10917121secs if necessary
I0528 12:19:50.498359 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:50.632482 30873 hierarchical_allocator_process.hpp:725] No
resources available to allocate!
I0528 12:19:50.632500 30873 hierarchical_allocator_process.hpp:687] Performed
allocation for 0 slaves in 29372ns
I0528 12:19:51.608304 30873 slave.cpp:903] Will retry registration in
2.038690505secs if necessary
I0528 12:19:51.608422 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:51.633369 30873 hierarchical_allocator_process.hpp:725] No
resources available to allocate!
I0528 12:19:51.633388 30873 hierarchical_allocator_process.hpp:687] Performed
allocation for 0 slaves in 29261ns
I0528 12:19:52.634245 30873 hierarchical_allocator_process.hpp:725] No
resources available to allocate!
I0528 12:19:52.634287 30873 hierarchical_allocator_process.hpp:687] Performed
allocation for 0 slaves in 56100ns
I0528 12:19:53.634836 30873 hierarchical_allocator_process.hpp:725] No
resources available to allocate!
I0528 12:19:53.634865 30873 hierarchical_allocator_process.hpp:687] Performed
allocation for 0 slaves in 45025ns
I0528 12:19:53.648061 30873 slave.cpp:903] Will retry registration in
4.21929484secs if necessary
I0528 12:19:53.648147 30873 master.cpp:2285] Ignoring register slave message
from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in
progress
I0528 12:19:54.632284 30873 master.cpp:108] No whitelist given. Advertising
offers for all slaves
I0528 12:19:54.635412 30873 hierarchical_allocator_process.hpp:725] No
resources available to allocate!
I0528 12:19:54.635432 30873 hierarchical_allocator_process.hpp:687] Performed
allocation for 0 slaves in 27871ns
E0528 12:19:54.668265 30873 registrar.cpp:500] Registrar aborting: Failed to
update 'registry': Failed to perform store within 5secs
F0528 12:19:54.668367 30873 master.cpp:2319] Failed to admit slave
20140528-121949-2272962752-54693-30850-0 at slave(41)@192.168.122.135:54693
(fedora-19): Failed to update 'registry': Failed to perform store within 5secs
*** Check failure stack trace: ***
@ 0x7f2632732ac6 google::LogMessage::SendToLog()
@ 0x7f2632732f2e google::LogMessage::Flush()
@ 0x7f26327365c9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f26320b52fc mesos::internal::master::Master::_registerSlave()
@ 0x7f263263279c process::ProcessManager::resume()
@ 0x7f26326322ef process::schedule()
@ 0x3284c07c53 (unknown)
@ 0x32844f5dbd (unknown)
make[3]: *** [check-local] Aborted (core dumped)
{noformat}
1. In some cases 5secs timeout for registry storage is probably too small. We
can use a bigger value for the flag {{registry_store_timeout}} at least in
tests.
2. Use EXIT(1)
--
This message was sent by Atlassian JIRA
(v6.2#6252)