[ 
https://issues.apache.org/jira/browse/MESOS-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-1428.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
         Assignee: Benjamin Mahler

commit 81c05c973afdb0536de61900320ca44d8ce7c106
Author: Benjamin Mahler <[email protected]>
Date:   Tue Jun 3 17:56:52 2014 -0700

    Increased the registry store timeout for tests.


> Failed to update 'registry': Failed to perform store within 5secs (caused 
> flaky MasterTest.StatusUpdateAcknowledgementsThroughMaster)
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-1428
>                 URL: https://issues.apache.org/jira/browse/MESOS-1428
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>            Assignee: Benjamin Mahler
>             Fix For: 0.20.0
>
>
> {noformat}
> [ RUN      ] MasterTest.StatusUpdateAcknowledgementsThroughMaster
> Using temporary directory 
> '/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_QbH4lp'
> I0528 12:19:49.619946 30850 leveldb.cpp:176] Opened db in 719729ns
> I0528 12:19:49.620810 30850 leveldb.cpp:183] Compacted db in 274760ns
> I0528 12:19:49.621376 30850 leveldb.cpp:198] Created db iterator in 17543ns
> I0528 12:19:49.621711 30850 leveldb.cpp:204] Seeked to beginning of db in 
> 13943ns
> I0528 12:19:49.622164 30850 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 13863ns
> I0528 12:19:49.622588 30850 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0528 12:19:49.623253 30871 recover.cpp:425] Starting replica recovery
> I0528 12:19:49.623350 30871 recover.cpp:451] Replica is in EMPTY status
> I0528 12:19:49.623810 30868 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0528 12:19:49.624260 30871 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0528 12:19:49.624357 30871 recover.cpp:542] Updating replica status to 
> STARTING
> I0528 12:19:49.624482 30871 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 50522ns
> I0528 12:19:49.624498 30871 replica.cpp:320] Persisted replica status to 
> STARTING
> I0528 12:19:49.624552 30871 recover.cpp:451] Replica is in STARTING status
> I0528 12:19:49.624794 30871 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0528 12:19:49.624848 30871 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0528 12:19:49.624943 30871 recover.cpp:542] Updating replica status to VOTING
> I0528 12:19:49.625004 30871 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 22931ns
> I0528 12:19:49.625016 30871 replica.cpp:320] Persisted replica status to 
> VOTING
> I0528 12:19:49.625699 30871 recover.cpp:556] Successfully joined the Paxos 
> group
> I0528 12:19:49.625758 30871 recover.cpp:440] Recover process terminated
> I0528 12:19:49.630477 30869 master.cpp:272] Master 
> 20140528-121949-2272962752-54693-30850 (fedora-19) started on 
> 192.168.122.135:54693
> I0528 12:19:49.630508 30869 master.cpp:309] Master only allowing 
> authenticated frameworks to register
> I0528 12:19:49.630516 30869 master.cpp:314] Master only allowing 
> authenticated slaves to register
> I0528 12:19:49.630523 30869 credentials.hpp:35] Loading credentials for 
> authentication from 
> '/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_QbH4lp/credentials'
> I0528 12:19:49.630601 30869 master.cpp:340] Master enabling authorization
> I0528 12:19:49.631073 30868 master.cpp:108] No whitelist given. Advertising 
> offers for all slaves
> I0528 12:19:49.631428 30874 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> [email protected]:54693
> I0528 12:19:49.632640 30869 master.cpp:943] The newly elected leader is 
> [email protected]:54693 with id 20140528-121949-2272962752-54693-30850
> I0528 12:19:49.632664 30869 master.cpp:956] Elected as the leading master!
> I0528 12:19:49.632673 30869 master.cpp:774] Recovering from registrar
> I0528 12:19:49.632733 30869 registrar.cpp:313] Recovering registrar
> I0528 12:19:49.632969 30869 log.cpp:656] Attempting to start the writer
> I0528 12:19:49.633388 30869 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0528 12:19:49.633424 30869 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 23276ns
> I0528 12:19:49.633435 30869 replica.cpp:342] Persisted promised to 1
> I0528 12:19:49.634053 30873 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0528 12:19:49.634415 30873 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0528 12:19:49.634454 30873 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 25866ns
> I0528 12:19:49.634464 30873 replica.cpp:676] Persisted action at 0
> I0528 12:19:49.634755 30873 replica.cpp:508] Replica received write request 
> for position 0
> I0528 12:19:49.634791 30873 leveldb.cpp:438] Reading position from leveldb 
> took 25880ns
> I0528 12:19:49.634819 30873 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 18061ns
> I0528 12:19:49.634829 30873 replica.cpp:676] Persisted action at 0
> I0528 12:19:49.634948 30873 replica.cpp:655] Replica received learned notice 
> for position 0
> I0528 12:19:49.634976 30873 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 19561ns
> I0528 12:19:49.634987 30873 replica.cpp:676] Persisted action at 0
> I0528 12:19:49.634994 30873 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0528 12:19:49.635668 30869 log.cpp:672] Writer started with ending position 0
> I0528 12:19:49.636436 30869 leveldb.cpp:438] Reading position from leveldb 
> took 22086ns
> I0528 12:19:49.640519 30868 registrar.cpp:346] Successfully fetched the 
> registry (0B)
> I0528 12:19:49.640575 30868 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0528 12:19:49.641543 30868 log.cpp:680] Attempting to append 131 bytes to 
> the log
> I0528 12:19:49.641621 30868 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 1
> I0528 12:19:49.642545 30869 replica.cpp:508] Replica received write request 
> for position 1
> I0528 12:19:49.642987 30869 leveldb.cpp:343] Persisting action (150 bytes) to 
> leveldb took 57416ns
> I0528 12:19:49.643421 30869 replica.cpp:676] Persisted action at 1
> I0528 12:19:49.644212 30871 replica.cpp:655] Replica received learned notice 
> for position 1
> I0528 12:19:49.644254 30871 leveldb.cpp:343] Persisting action (152 bytes) to 
> leveldb took 25541ns
> I0528 12:19:49.644266 30871 replica.cpp:676] Persisted action at 1
> I0528 12:19:49.644273 30871 replica.cpp:661] Replica learned APPEND action at 
> position 1
> I0528 12:19:49.648641 30872 registrar.cpp:479] Successfully updated 'registry'
> I0528 12:19:49.648681 30872 registrar.cpp:372] Successfully recovered 
> registrar
> I0528 12:19:49.648738 30872 master.cpp:801] Recovered 0 slaves from the 
> Registry (95B) ; allowing 10mins for slaves to re-register
> I0528 12:19:49.650076 30874 log.cpp:699] Attempting to truncate the log to 1
> I0528 12:19:49.650135 30874 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 2
> I0528 12:19:49.650290 30874 replica.cpp:508] Replica received write request 
> for position 2
> I0528 12:19:49.650326 30874 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 24238ns
> I0528 12:19:49.650337 30874 replica.cpp:676] Persisted action at 2
> I0528 12:19:49.650475 30874 replica.cpp:655] Replica received learned notice 
> for position 2
> I0528 12:19:49.650504 30874 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 21030ns
> I0528 12:19:49.650521 30874 leveldb.cpp:401] Deleting ~1 keys from leveldb 
> took 10152ns
> I0528 12:19:49.650531 30874 replica.cpp:676] Persisted action at 2
> I0528 12:19:49.650537 30874 replica.cpp:661] Replica learned TRUNCATE action 
> at position 2
> I0528 12:19:49.655011 30875 slave.cpp:143] Slave started on 
> 41)@192.168.122.135:54693
> I0528 12:19:49.655096 30875 credentials.hpp:35] Loading credentials for 
> authentication from 
> '/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_orwcE4/credential'
> I0528 12:19:49.655164 30875 slave.cpp:242] Slave using credential for: 
> test-principal
> I0528 12:19:49.655258 30875 slave.cpp:255] Slave resources: cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0528 12:19:49.655769 30875 slave.cpp:283] Slave hostname: fedora-19
> I0528 12:19:49.655787 30875 slave.cpp:284] Slave checkpoint: false
> I0528 12:19:49.656296 30868 state.cpp:33] Recovering state from 
> '/tmp/MasterTest_StatusUpdateAcknowledgementsThroughMaster_orwcE4/meta'
> I0528 12:19:49.656734 30875 status_update_manager.cpp:193] Recovering status 
> update manager
> I0528 12:19:49.656857 30875 slave.cpp:3019] Finished recovery
> I0528 12:19:49.657474 30875 slave.cpp:536] New master detected at 
> [email protected]:54693
> I0528 12:19:49.658026 30870 status_update_manager.cpp:167] New master 
> detected at [email protected]:54693
> I0528 12:19:49.658486 30875 slave.cpp:612] Authenticating with master 
> [email protected]:54693
> I0528 12:19:49.658818 30875 slave.cpp:585] Detecting new master
> I0528 12:19:49.658874 30875 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0528 12:19:49.659700 30875 master.cpp:2981] Authenticating 
> slave(41)@192.168.122.135:54693
> I0528 12:19:49.660415 30872 authenticator.hpp:156] Creating new server SASL 
> connection
> I0528 12:19:49.660895 30872 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0528 12:19:49.660919 30872 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0528 12:19:49.660944 30872 authenticator.hpp:262] Received SASL 
> authentication start
> I0528 12:19:49.660996 30872 authenticator.hpp:384] Authentication requires 
> more steps
> I0528 12:19:49.661142 30872 authenticatee.hpp:265] Received SASL 
> authentication step
> I0528 12:19:49.661201 30872 authenticator.hpp:290] Received SASL 
> authentication step
> I0528 12:19:49.661221 30872 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false 
> I0528 12:19:49.661228 30872 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0528 12:19:49.661238 30872 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0528 12:19:49.661258 30872 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true 
> I0528 12:19:49.661267 30872 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0528 12:19:49.661273 30872 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0528 12:19:49.661284 30872 authenticator.hpp:376] Authentication success
> I0528 12:19:49.661308 30872 authenticatee.hpp:305] Authentication success
> I0528 12:19:49.661351 30872 slave.cpp:669] Successfully authenticated with 
> master [email protected]:54693
> I0528 12:19:49.661396 30872 slave.cpp:903] Will retry registration in 
> 18.309152ms if necessary
> I0528 12:19:49.661963 30875 master.cpp:2239] Queuing up registration request 
> from slave(41)@192.168.122.135:54693 because authentication is still in 
> progress
> I0528 12:19:49.662287 30875 master.cpp:3021] Successfully authenticated 
> principal 'test-principal' at slave(41)@192.168.122.135:54693
> I0528 12:19:49.665110 30875 master.cpp:2297] Registering slave at 
> slave(41)@192.168.122.135:54693 (fedora-19) with id 
> 20140528-121949-2272962752-54693-30850-0
> I0528 12:19:49.666139 30870 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0528 12:19:49.674161 30850 sched.cpp:126] Version: 0.19.0
> I0528 12:19:49.675194 30873 sched.cpp:222] New master detected at 
> [email protected]:54693
> I0528 12:19:49.675230 30873 sched.cpp:273] Authenticating with master 
> [email protected]:54693
> I0528 12:19:49.675297 30873 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0528 12:19:49.675968 30873 master.cpp:2981] Authenticating 
> scheduler(35)@192.168.122.135:54693
> I0528 12:19:49.676316 30873 authenticator.hpp:156] Creating new server SASL 
> connection
> I0528 12:19:49.676792 30873 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0528 12:19:49.676820 30873 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0528 12:19:49.676847 30873 authenticator.hpp:262] Received SASL 
> authentication start
> I0528 12:19:49.676878 30873 authenticator.hpp:384] Authentication requires 
> more steps
> I0528 12:19:49.676903 30873 authenticatee.hpp:265] Received SASL 
> authentication step
> I0528 12:19:49.676934 30873 authenticator.hpp:290] Received SASL 
> authentication step
> I0528 12:19:49.676949 30873 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false 
> I0528 12:19:49.676956 30873 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0528 12:19:49.676965 30873 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0528 12:19:49.676975 30873 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true 
> I0528 12:19:49.676981 30873 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0528 12:19:49.676986 30873 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0528 12:19:49.676997 30873 authenticator.hpp:376] Authentication success
> I0528 12:19:49.677021 30873 authenticatee.hpp:305] Authentication success
> I0528 12:19:49.677472 30868 sched.cpp:347] Successfully authenticated with 
> master [email protected]:54693
> I0528 12:19:49.677491 30868 sched.cpp:466] Sending registration request to 
> [email protected]:54693
> I0528 12:19:49.677085 30871 master.cpp:3021] Successfully authenticated 
> principal 'test-principal' at scheduler(35)@192.168.122.135:54693
> I0528 12:19:49.677599 30871 master.cpp:1027] Received registration request 
> from scheduler(35)@192.168.122.135:54693
> I0528 12:19:49.677635 30871 master.cpp:1045] Registering framework 
> 20140528-121949-2272962752-54693-30850-0000 at 
> scheduler(35)@192.168.122.135:54693
> I0528 12:19:49.677695 30871 sched.cpp:397] Framework registered with 
> 20140528-121949-2272962752-54693-30850-0000
> I0528 12:19:49.677714 30871 sched.cpp:411] Scheduler::registered took 12474ns
> I0528 12:19:49.677744 30871 hierarchical_allocator_process.hpp:331] Added 
> framework 20140528-121949-2272962752-54693-30850-0000
> I0528 12:19:49.677754 30871 hierarchical_allocator_process.hpp:725] No 
> resources available to allocate!
> I0528 12:19:49.677760 30871 hierarchical_allocator_process.hpp:687] Performed 
> allocation for 0 slaves in 10063ns
> I0528 12:19:49.680667 30873 slave.cpp:903] Will retry registration in 
> 24.147524ms if necessary
> I0528 12:19:49.680727 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:49.705091 30873 slave.cpp:903] Will retry registration in 
> 13.305655ms if necessary
> I0528 12:19:49.705150 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:49.719496 30873 slave.cpp:903] Will retry registration in 
> 101.534376ms if necessary
> I0528 12:19:49.719555 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:49.821956 30873 slave.cpp:903] Will retry registration in 
> 51.019643ms if necessary
> I0528 12:19:49.822016 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:49.873402 30873 slave.cpp:903] Will retry registration in 
> 623.788265ms if necessary
> I0528 12:19:49.873461 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:50.498261 30873 slave.cpp:903] Will retry registration in 
> 1.10917121secs if necessary
> I0528 12:19:50.498359 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:50.632482 30873 hierarchical_allocator_process.hpp:725] No 
> resources available to allocate!
> I0528 12:19:50.632500 30873 hierarchical_allocator_process.hpp:687] Performed 
> allocation for 0 slaves in 29372ns
> I0528 12:19:51.608304 30873 slave.cpp:903] Will retry registration in 
> 2.038690505secs if necessary
> I0528 12:19:51.608422 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:51.633369 30873 hierarchical_allocator_process.hpp:725] No 
> resources available to allocate!
> I0528 12:19:51.633388 30873 hierarchical_allocator_process.hpp:687] Performed 
> allocation for 0 slaves in 29261ns
> I0528 12:19:52.634245 30873 hierarchical_allocator_process.hpp:725] No 
> resources available to allocate!
> I0528 12:19:52.634287 30873 hierarchical_allocator_process.hpp:687] Performed 
> allocation for 0 slaves in 56100ns
> I0528 12:19:53.634836 30873 hierarchical_allocator_process.hpp:725] No 
> resources available to allocate!
> I0528 12:19:53.634865 30873 hierarchical_allocator_process.hpp:687] Performed 
> allocation for 0 slaves in 45025ns
> I0528 12:19:53.648061 30873 slave.cpp:903] Will retry registration in 
> 4.21929484secs if necessary
> I0528 12:19:53.648147 30873 master.cpp:2285] Ignoring register slave message 
> from slave(41)@192.168.122.135:54693 (fedora-19) as admission is already in 
> progress
> I0528 12:19:54.632284 30873 master.cpp:108] No whitelist given. Advertising 
> offers for all slaves
> I0528 12:19:54.635412 30873 hierarchical_allocator_process.hpp:725] No 
> resources available to allocate!
> I0528 12:19:54.635432 30873 hierarchical_allocator_process.hpp:687] Performed 
> allocation for 0 slaves in 27871ns
> E0528 12:19:54.668265 30873 registrar.cpp:500] Registrar aborting: Failed to 
> update 'registry': Failed to perform store within 5secs
> F0528 12:19:54.668367 30873 master.cpp:2319] Failed to admit slave 
> 20140528-121949-2272962752-54693-30850-0 at slave(41)@192.168.122.135:54693 
> (fedora-19): Failed to update 'registry': Failed to perform store within 5secs
> *** Check failure stack trace: ***
>     @     0x7f2632732ac6  google::LogMessage::SendToLog()
>     @     0x7f2632732f2e  google::LogMessage::Flush()
>     @     0x7f26327365c9  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f26320b52fc  mesos::internal::master::Master::_registerSlave()
>     @     0x7f263263279c  process::ProcessManager::resume()
>     @     0x7f26326322ef  process::schedule()
>     @       0x3284c07c53  (unknown)
>     @       0x32844f5dbd  (unknown)
> make[3]: *** [check-local] Aborted (core dumped)
> {noformat}
> 1. In some cases 5secs timeout for registry storage is probably too small. We 
> can use a bigger value for the flag {{registry_store_timeout}} at least in 
> tests.
> 2. Use EXIT(1)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to