[ 
https://issues.apache.org/jira/browse/MESOS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048388#comment-14048388
 ] 

Yifan Gu commented on MESOS-1543:
---------------------------------

Wait. I don't think that will help...
If the master is still in recovery when the slave and the framework try to 
register, then the slave will fail to register for the first time, and wait for 
some while, then do a second registration, but by the time the slave 
successfully registered, the framework will have already sent out a second 
registeration message, and will get registered too, which is not we want...

> MasterTest.OrphanTasks is flaky
> -------------------------------
>
>                 Key: MESOS-1543
>                 URL: https://issues.apache.org/jira/browse/MESOS-1543
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Vinod Kone
>            Assignee: Yifan Gu
>
> Looks like the slave's detector was discarded before the slave shutdown.
> {code}
> [ RUN      ] MasterTest.OrphanTasks
> Using temporary directory '/tmp/MasterTest_OrphanTasks_s8r6NU'
> I0625 16:07:58.861265 26203 leveldb.cpp:176] Opened db in 48.741714ms
> I0625 16:07:58.877020 26203 leveldb.cpp:183] Compacted db in 15.444957ms
> I0625 16:07:58.877488 26203 leveldb.cpp:198] Created db iterator in 8308ns
> I0625 16:07:58.877691 26203 leveldb.cpp:204] Seeked to beginning of db in 
> 2133ns
> I0625 16:07:58.878000 26203 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1896ns
> I0625 16:07:58.878221 26203 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0625 16:07:58.878715 26229 recover.cpp:425] Starting replica recovery
> I0625 16:07:58.878783 26229 recover.cpp:451] Replica is in EMPTY status
> I0625 16:07:58.879034 26229 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0625 16:07:58.879088 26229 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0625 16:07:58.879166 26229 recover.cpp:542] Updating replica status to 
> STARTING
> I0625 16:07:58.882308 26228 master.cpp:288] Master 
> 20140625-160758-16842879-47563-26203 (lucid) started on 127.0.1.1:47563
> I0625 16:07:58.882339 26228 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0625 16:07:58.882345 26228 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0625 16:07:58.882351 26228 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/MasterTest_OrphanTasks_s8r6NU/credentials'
> I0625 16:07:58.882427 26228 master.cpp:356] Authorization enabled
> I0625 16:07:58.883036 26228 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> [email protected]:47563
> I0625 16:07:58.883080 26228 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0625 16:07:58.883249 26228 master.cpp:1122] The newly elected leader is 
> [email protected]:47563 with id 20140625-160758-16842879-47563-26203
> I0625 16:07:58.883262 26228 master.cpp:1135] Elected as the leading master!
> I0625 16:07:58.883268 26228 master.cpp:953] Recovering from registrar
> I0625 16:07:58.883323 26228 registrar.cpp:313] Recovering registrar
> I0625 16:07:58.895361 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.126207ms
> I0625 16:07:58.895427 26229 replica.cpp:320] Persisted replica status to 
> STARTING
> I0625 16:07:58.895555 26229 recover.cpp:451] Replica is in STARTING status
> I0625 16:07:58.895905 26229 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0625 16:07:58.895972 26229 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0625 16:07:58.896081 26229 recover.cpp:542] Updating replica status to VOTING
> I0625 16:07:58.914243 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 18.082424ms
> I0625 16:07:58.914566 26229 replica.cpp:320] Persisted replica status to 
> VOTING
> I0625 16:07:58.914927 26229 recover.cpp:556] Successfully joined the Paxos 
> group
> I0625 16:07:58.915011 26229 recover.cpp:440] Recover process terminated
> I0625 16:07:58.915163 26229 log.cpp:656] Attempting to start the writer
> I0625 16:07:58.915468 26229 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0625 16:07:58.935538 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.034561ms
> I0625 16:07:58.936040 26229 replica.cpp:342] Persisted promised to 1
> I0625 16:07:58.947181 26230 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0625 16:07:58.947559 26230 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0625 16:07:58.962541 26230 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 14.946528ms
> I0625 16:07:58.962818 26230 replica.cpp:676] Persisted action at 0
> I0625 16:07:58.963426 26230 replica.cpp:508] Replica received write request 
> for position 0
> I0625 16:07:58.963462 26230 leveldb.cpp:438] Reading position from leveldb 
> took 19380ns
> I0625 16:07:58.982928 26230 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 19.433595ms
> I0625 16:07:58.983219 26230 replica.cpp:676] Persisted action at 0
> I0625 16:07:58.987048 26223 replica.cpp:655] Replica received learned notice 
> for position 0
> I0625 16:07:59.000260 26223 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 13.156945ms
> I0625 16:07:59.000320 26223 replica.cpp:676] Persisted action at 0
> I0625 16:07:59.000331 26223 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0625 16:07:59.000578 26223 log.cpp:672] Writer started with ending position 0
> I0625 16:07:59.000834 26223 leveldb.cpp:438] Reading position from leveldb 
> took 22576ns
> I0625 16:07:59.002019 26223 registrar.cpp:346] Successfully fetched the 
> registry (0B)
> I0625 16:07:59.002043 26223 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0625 16:07:59.003206 26223 log.cpp:680] Attempting to append 118 bytes to 
> the log
> I0625 16:07:59.003253 26223 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 1
> I0625 16:07:59.003466 26223 replica.cpp:508] Replica received write request 
> for position 1
> I0625 16:07:59.018249 26223 leveldb.cpp:343] Persisting action (135 bytes) to 
> leveldb took 14.747648ms
> I0625 16:07:59.018712 26223 replica.cpp:676] Persisted action at 1
> I0625 16:07:59.027173 26227 replica.cpp:655] Replica received learned notice 
> for position 1
> I0625 16:07:59.041352 26227 leveldb.cpp:343] Persisting action (137 bytes) to 
> leveldb took 14.12408ms
> I0625 16:07:59.041412 26227 replica.cpp:676] Persisted action at 1
> I0625 16:07:59.041424 26227 replica.cpp:661] Replica learned APPEND action at 
> position 1
> I0625 16:07:59.041815 26227 registrar.cpp:479] Successfully updated 'registry'
> I0625 16:07:59.041858 26227 registrar.cpp:372] Successfully recovered 
> registrar
> I0625 16:07:59.041903 26227 log.cpp:699] Attempting to truncate the log to 1
> I0625 16:07:59.041981 26227 master.cpp:980] Recovered 0 slaves from the 
> Registry (82B) ; allowing 10mins for slaves to re-register
> I0625 16:07:59.042021 26227 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 2
> I0625 16:07:59.042628 26226 replica.cpp:508] Replica received write request 
> for position 2
> I0625 16:07:59.046061 26228 slave.cpp:168] Slave started on 
> 117)@127.0.1.1:47563
> I0625 16:07:59.046083 26228 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/MasterTest_OrphanTasks_R5G2iM/credential'
> I0625 16:07:59.046162 26228 slave.cpp:268] Slave using credential for: 
> test-principal
> I0625 16:07:59.046252 26228 slave.cpp:281] Slave resources: cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0625 16:07:59.046324 26228 slave.cpp:309] Slave hostname: lucid
> I0625 16:07:59.046334 26228 slave.cpp:310] Slave checkpoint: false
> I0625 16:07:59.046794 26228 state.cpp:33] Recovering state from 
> '/tmp/MasterTest_OrphanTasks_R5G2iM/meta'
> I0625 16:07:59.046892 26228 status_update_manager.cpp:193] Recovering status 
> update manager
> I0625 16:07:59.046967 26228 slave.cpp:3111] Finished recovery
> I0625 16:07:59.047200 26228 slave.cpp:584] New master detected at 
> [email protected]:47563
> I0625 16:07:59.047224 26228 slave.cpp:660] Authenticating with master 
> [email protected]:47563
> I0625 16:07:59.047263 26228 slave.cpp:633] Detecting new master
> I0625 16:07:59.047297 26228 status_update_manager.cpp:167] New master 
> detected at [email protected]:47563
> I0625 16:07:59.047333 26228 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0625 16:07:59.047412 26228 master.cpp:3499] Authenticating 
> slave(117)@127.0.1.1:47563
> I0625 16:07:59.047493 26228 authenticator.hpp:156] Creating new server SASL 
> connection
> I0625 16:07:59.047550 26228 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0625 16:07:59.047564 26228 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0625 16:07:59.047585 26228 authenticator.hpp:262] Received SASL 
> authentication start
> I0625 16:07:59.047621 26228 authenticator.hpp:384] Authentication requires 
> more steps
> I0625 16:07:59.047646 26228 authenticatee.hpp:265] Received SASL 
> authentication step
> I0625 16:07:59.047678 26228 authenticator.hpp:290] Received SASL 
> authentication step
> I0625 16:07:59.047693 26228 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0625 16:07:59.047699 26228 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0625 16:07:59.047708 26228 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0625 16:07:59.047716 26228 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0625 16:07:59.047721 26228 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.047726 26228 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.047736 26228 authenticator.hpp:376] Authentication success
> I0625 16:07:59.047760 26228 authenticatee.hpp:305] Authentication success
> I0625 16:07:59.047780 26228 master.cpp:3539] Successfully authenticated 
> principal 'test-principal' at slave(117)@127.0.1.1:47563
> I0625 16:07:59.047847 26228 slave.cpp:717] Successfully authenticated with 
> master [email protected]:47563
> I0625 16:07:59.047886 26228 slave.cpp:955] Will retry registration in 
> 11.156008ms if necessary
> I0625 16:07:59.047976 26223 master.cpp:2781] Registering slave at 
> slave(117)@127.0.1.1:47563 (lucid) with id 
> 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.048074 26223 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0625 16:07:59.051329 26203 sched.cpp:139] Version: 0.20.0
> I0625 16:07:59.051686 26225 sched.cpp:235] New master detected at 
> [email protected]:47563
> I0625 16:07:59.051718 26225 sched.cpp:285] Authenticating with master 
> [email protected]:47563
> I0625 16:07:59.051816 26225 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0625 16:07:59.051900 26225 master.cpp:3499] Authenticating 
> [email protected]:47563
> I0625 16:07:59.051978 26225 authenticator.hpp:156] Creating new server SASL 
> connection
> I0625 16:07:59.052034 26225 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0625 16:07:59.052047 26225 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0625 16:07:59.052067 26225 authenticator.hpp:262] Received SASL 
> authentication start
> I0625 16:07:59.052099 26225 authenticator.hpp:384] Authentication requires 
> more steps
> I0625 16:07:59.052124 26225 authenticatee.hpp:265] Received SASL 
> authentication step
> I0625 16:07:59.052155 26225 authenticator.hpp:290] Received SASL 
> authentication step
> I0625 16:07:59.052168 26225 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0625 16:07:59.052175 26225 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0625 16:07:59.052183 26225 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0625 16:07:59.052191 26225 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0625 16:07:59.052196 26225 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.052201 26225 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.052211 26225 authenticator.hpp:376] Authentication success
> I0625 16:07:59.052234 26225 authenticatee.hpp:305] Authentication success
> I0625 16:07:59.052255 26225 master.cpp:3539] Successfully authenticated 
> principal 'test-principal' at 
> [email protected]:47563
> I0625 16:07:59.052322 26225 sched.cpp:359] Successfully authenticated with 
> master [email protected]:47563
> I0625 16:07:59.052335 26225 sched.cpp:478] Sending registration request to 
> [email protected]:47563
> I0625 16:07:59.052376 26225 master.cpp:1241] Received registration request 
> from [email protected]:47563
> I0625 16:07:59.052393 26225 master.cpp:1201] Authorizing framework principal 
> 'test-principal' to receive offers for role '*'
> I0625 16:07:59.052490 26225 master.cpp:1300] Registering framework 
> 20140625-160758-16842879-47563-26203-0000 at 
> [email protected]:47563
> I0625 16:07:59.052580 26225 sched.cpp:409] Framework registered with 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.052602 26225 sched.cpp:423] Scheduler::registered took 8220ns
> I0625 16:07:59.052634 26225 hierarchical_allocator_process.hpp:331] Added 
> framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.052642 26225 hierarchical_allocator_process.hpp:724] No 
> resources available to allocate!
> I0625 16:07:59.052647 26225 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 0 slaves in 5845ns
> I0625 16:07:59.061020 26226 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 18.344559ms
> I0625 16:07:59.061079 26226 replica.cpp:676] Persisted action at 2
> I0625 16:07:59.061353 26226 replica.cpp:655] Replica received learned notice 
> for position 2
> I0625 16:07:59.066944 26223 slave.cpp:955] Will retry registration in 
> 32.268831ms if necessary
> I0625 16:07:59.067044 26223 master.cpp:2769] Ignoring register slave message 
> from slave(117)@127.0.1.1:47563 (lucid) as admission is already in progress
> I0625 16:07:59.077322 26226 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 15.931404ms
> I0625 16:07:59.077587 26226 leveldb.cpp:401] Deleting ~1 keys from leveldb 
> took 51614ns
> I0625 16:07:59.077605 26226 replica.cpp:676] Persisted action at 2
> I0625 16:07:59.077615 26226 replica.cpp:661] Replica learned TRUNCATE action 
> at position 2
> I0625 16:07:59.078006 26226 log.cpp:680] Attempting to append 289 bytes to 
> the log
> I0625 16:07:59.078053 26226 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 3
> I0625 16:07:59.078299 26226 replica.cpp:508] Replica received write request 
> for position 3
> I0625 16:07:59.094300 26226 leveldb.cpp:343] Persisting action (308 bytes) to 
> leveldb took 15.951731ms
> I0625 16:07:59.094590 26226 replica.cpp:676] Persisted action at 3
> I0625 16:07:59.107091 26227 replica.cpp:655] Replica received learned notice 
> for position 3
> I0625 16:07:59.107347 26224 slave.cpp:955] Will retry registration in 
> 75.040916ms if necessary
> I0625 16:07:59.107457 26224 master.cpp:2769] Ignoring register slave message 
> from slave(117)@127.0.1.1:47563 (lucid) as admission is already in progress
> I0625 16:07:59.118427 26227 leveldb.cpp:343] Persisting action (310 bytes) to 
> leveldb took 11.273294ms
> I0625 16:07:59.118489 26227 replica.cpp:676] Persisted action at 3
> I0625 16:07:59.118499 26227 replica.cpp:661] Replica learned APPEND action at 
> position 3
> I0625 16:07:59.118921 26227 registrar.cpp:479] Successfully updated 'registry'
> I0625 16:07:59.119007 26227 log.cpp:699] Attempting to truncate the log to 3
> I0625 16:07:59.119071 26227 master.cpp:2821] Registered slave 
> 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
> I0625 16:07:59.119088 26227 master.cpp:3967] Adding slave 
> 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid) 
> with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0625 16:07:59.119207 26227 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 4
> I0625 16:07:59.119288 26227 slave.cpp:751] Registered with master 
> [email protected]:47563; given slave ID 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.119355 26227 hierarchical_allocator_process.hpp:444] Added 
> slave 20140625-160758-16842879-47563-26203-0 (lucid) with cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0625 16:07:59.119410 26227 hierarchical_allocator_process.hpp:750] Offering 
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
> 20140625-160758-16842879-47563-26203-0 to framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.119496 26227 hierarchical_allocator_process.hpp:706] Performed 
> allocation for slave 20140625-160758-16842879-47563-26203-0 in 110886ns
> I0625 16:07:59.119549 26227 slave.cpp:2308] Received ping from 
> slave-observer(97)@127.0.1.1:47563
> I0625 16:07:59.119604 26227 master.hpp:794] Adding offer 
> 20140625-160758-16842879-47563-26203-0 with resources cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] on slave 
> 20140625-160758-16842879-47563-26203-0 (lucid)
> I0625 16:07:59.119644 26227 master.cpp:3446] Sending 1 offers to framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.119900 26227 sched.cpp:546] Scheduler::resourceOffers took 
> 125487ns
> I0625 16:07:59.120072 26227 master.hpp:804] Removing offer 
> 20140625-160758-16842879-47563-26203-0 with resources cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] on slave 
> 20140625-160758-16842879-47563-26203-0 (lucid)
> I0625 16:07:59.120121 26227 master.cpp:2125] Processing reply for offers: [ 
> 20140625-160758-16842879-47563-26203-0 ] on slave 
> 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid) 
> for framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.120141 26227 master.cpp:2211] Authorizing framework principal 
> 'test-principal' to launch task 0 as user 'jenkins'
> I0625 16:07:59.120472 26227 master.hpp:766] Adding task 0 with resources 
> cpus(*):1; mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 (lucid)
> I0625 16:07:59.120501 26227 master.cpp:2277] Launching task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000 with resources cpus(*):1; mem(*):64 
> on slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 
> (lucid)
> I0625 16:07:59.120621 26227 replica.cpp:508] Replica received write request 
> for position 4
> I0625 16:07:59.121064 26224 slave.cpp:986] Got assigned task 0 for framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.121354 26224 slave.cpp:1096] Launching task 0 for framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124053 26224 exec.cpp:131] Version: 0.20.0
> I0625 16:07:59.124178 26224 slave.cpp:1206] Queuing task '0' for executor 
> default of framework '20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124223 26224 slave.cpp:2453] Monitoring executor 'default' of 
> framework '20140625-160758-16842879-47563-26203-0000' in container 
> '42b56b06-5a72-41bb-8137-67926c47c9e9'
> I0625 16:07:59.124274 26224 hierarchical_allocator_process.hpp:546] Framework 
> 20140625-160758-16842879-47563-26203-0000 left cpus(*):1; mem(*):960; 
> disk(*):1024; ports(*):[31000-32000] unused on slave 
> 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.124358 26224 hierarchical_allocator_process.hpp:588] Framework 
> 20140625-160758-16842879-47563-26203-0000 filtered slave 
> 20140625-160758-16842879-47563-26203-0 for 5secs
> I0625 16:07:59.124452 26224 exec.cpp:181] Executor started at: 
> executor(36)@127.0.1.1:47563 with pid 26203
> I0625 16:07:59.124528 26224 slave.cpp:545] Successfully attached file 
> '/tmp/MasterTest_OrphanTasks_R5G2iM/slaves/20140625-160758-16842879-47563-26203-0/frameworks/20140625-160758-16842879-47563-26203-0000/executors/default/runs/42b56b06-5a72-41bb-8137-67926c47c9e9'
> I0625 16:07:59.124557 26224 slave.cpp:1717] Got registration for executor 
> 'default' of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124634 26224 slave.cpp:1836] Flushing queued task 0 for 
> executor 'default' of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124703 26224 exec.cpp:205] Executor registered on slave 
> 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.125650 26224 exec.cpp:217] Executor::registered took 15164ns
> I0625 16:07:59.125702 26224 exec.cpp:292] Executor asked to run task '0'
> I0625 16:07:59.125732 26224 exec.cpp:301] Executor::launchTask took 21us
> I0625 16:07:59.127326 26224 exec.cpp:524] Executor sending status update 
> TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of 
> framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.127782 26223 slave.cpp:2071] Handling status update 
> TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of 
> framework 20140625-160758-16842879-47563-26203-0000 from 
> executor(36)@127.0.1.1:47563
> I0625 16:07:59.127879 26223 status_update_manager.cpp:320] Received status 
> update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 
> of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.127893 26223 status_update_manager.cpp:499] Creating 
> StatusUpdate stream for task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.127962 26223 status_update_manager.cpp:373] Forwarding status 
> update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 
> of framework 20140625-160758-16842879-47563-26203-0000 to 
> [email protected]:47563
> I0625 16:07:59.128090 26223 master.cpp:3107] Status update TASK_RUNNING 
> (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000 from slave 
> 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
> I0625 16:07:59.128124 26223 slave.cpp:2229] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.128134 26223 slave.cpp:2235] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for 
> task 0 of framework 20140625-160758-16842879-47563-26203-0000 to 
> executor(36)@127.0.1.1:47563
> I0625 16:07:59.128206 26223 sched.cpp:637] Scheduler::statusUpdate took 
> 18758ns
> I0625 16:07:59.128267 26223 master.cpp:2631] Forwarding status update 
> acknowledgement 77f58dcb-2ae3-45d2-a071-2c808614ab3a for task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000 to slave 
> 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
> I0625 16:07:59.128363 26223 status_update_manager.cpp:398] Received status 
> update acknowledgement (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 
> 0 of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.128412 26223 slave.cpp:1657] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.129055 26226 process.cpp:3335] Handling HTTP event for process 
> 'master' with path: '/master/state.json'
> I0625 16:07:59.129106 26226 http.cpp:452] HTTP request for 
> '/master/state.json'
> I0625 16:07:59.129865 26231 process.cpp:1037] Socket closed while receiving
> I0625 16:07:59.130645 26203 master.cpp:619] Master terminating
> I0625 16:07:59.130695 26203 master.hpp:784] Removing task 0 with resources 
> cpus(*):1; mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 (lucid)
> W0625 16:07:59.130738 26203 master.cpp:4253] Removing task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000 and slave 
> 20140625-160758-16842879-47563-26203-0 in non-terminal state TASK_RUNNING
> I0625 16:07:59.131604 26226 slave.cpp:2315] [email protected]:47563 exited
> W0625 16:07:59.131628 26226 slave.cpp:2318] Master disconnected! Waiting for 
> a new master to be elected
> I0625 16:07:59.132665 26224 exec.cpp:338] Executor received status update 
> acknowledgement 77f58dcb-2ae3-45d2-a071-2c808614ab3a for task 0 of framework 
> 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.138859 26227 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 18.191907ms
> I0625 16:07:59.138913 26227 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.190522 26203 leveldb.cpp:176] Opened db in 42.905633ms
> I0625 16:07:59.229253 26203 leveldb.cpp:183] Compacted db in 38.417135ms
> I0625 16:07:59.229497 26203 leveldb.cpp:198] Created db iterator in 9638ns
> I0625 16:07:59.229696 26203 leveldb.cpp:204] Seeked to beginning of db in 
> 19529ns
> I0625 16:07:59.229907 26203 leveldb.cpp:273] Iterated through 5 keys in the 
> db in 46457ns
> I0625 16:07:59.230082 26203 replica.cpp:741] Replica recovered with log 
> positions 1 -> 4 with 0 holes and 1 unlearned
> I0625 16:07:59.230548 26226 recover.cpp:425] Starting replica recovery
> I0625 16:07:59.230633 26226 recover.cpp:451] Replica is in VOTING status
> I0625 16:07:59.230676 26226 recover.cpp:440] Recover process terminated
> I0625 16:07:59.234074 26224 master.cpp:288] Master 
> 20140625-160759-16842879-47563-26203 (lucid) started on 127.0.1.1:47563
> I0625 16:07:59.234104 26224 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0625 16:07:59.234110 26224 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0625 16:07:59.234118 26224 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/MasterTest_OrphanTasks_s8r6NU/credentials'
> I0625 16:07:59.234195 26224 master.cpp:356] Authorization enabled
> I0625 16:07:59.234711 26224 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> [email protected]:47563
> I0625 16:07:59.234755 26224 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0625 16:07:59.234928 26224 master.cpp:1122] The newly elected leader is 
> [email protected]:47563 with id 20140625-160759-16842879-47563-26203
> I0625 16:07:59.234941 26224 master.cpp:1135] Elected as the leading master!
> I0625 16:07:59.234948 26224 master.cpp:953] Recovering from registrar
> I0625 16:07:59.235004 26224 registrar.cpp:313] Recovering registrar
> I0625 16:07:59.235229 26224 log.cpp:656] Attempting to start the writer
> I0625 16:07:59.235543 26224 replica.cpp:474] Replica received implicit 
> promise request with proposal 2
> I0625 16:07:59.246944 26224 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 11.362596ms
> I0625 16:07:59.247259 26224 replica.cpp:342] Persisted promised to 2
> I0625 16:07:59.257194 26230 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0625 16:07:59.257627 26230 replica.cpp:375] Replica received explicit 
> promise request for position 4 with proposal 3
> I0625 16:07:59.257693 26230 leveldb.cpp:438] Reading position from leveldb 
> took 45817ns
> I0625 16:07:59.270833 26230 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 13.105094ms
> I0625 16:07:59.271126 26230 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.271867 26230 replica.cpp:508] Replica received write request 
> for position 4
> I0625 16:07:59.272071 26230 leveldb.cpp:438] Reading position from leveldb 
> took 24316ns
> I0625 16:07:59.283974 26230 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 11.714283ms
> I0625 16:07:59.284390 26230 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.287039 26226 replica.cpp:655] Replica received learned notice 
> for position 4
> I0625 16:07:59.297823 26226 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 10.735977ms
> I0625 16:07:59.297930 26226 leveldb.cpp:401] Deleting ~2 keys from leveldb 
> took 49963ns
> I0625 16:07:59.297947 26226 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.297956 26226 replica.cpp:661] Replica learned TRUNCATE action 
> at position 4
> I0625 16:07:59.298205 26226 log.cpp:672] Writer started with ending position 4
> I0625 16:07:59.298506 26226 leveldb.cpp:438] Reading position from leveldb 
> took 37502ns
> I0625 16:07:59.298533 26226 leveldb.cpp:438] Reading position from leveldb 
> took 6627ns
> I0625 16:07:59.298846 26226 registrar.cpp:346] Successfully fetched the 
> registry (250B)
> I0625 16:07:59.298873 26226 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0625 16:07:59.300153 26226 log.cpp:680] Attempting to append 289 bytes to 
> the log
> I0625 16:07:59.300201 26226 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 5
> I0625 16:07:59.300427 26226 replica.cpp:508] Replica received write request 
> for position 5
> I0625 16:07:59.313813 26226 leveldb.cpp:343] Persisting action (308 bytes) to 
> leveldb took 13.351282ms
> I0625 16:07:59.314107 26226 replica.cpp:676] Persisted action at 5
> I0625 16:07:59.327086 26224 replica.cpp:655] Replica received learned notice 
> for position 5
> I0625 16:07:59.338047 26224 leveldb.cpp:343] Persisting action (310 bytes) to 
> leveldb took 10.688745ms
> I0625 16:07:59.338348 26224 replica.cpp:676] Persisted action at 5
> I0625 16:07:59.338714 26224 replica.cpp:661] Replica learned APPEND action at 
> position 5
> I0625 16:07:59.339380 26224 registrar.cpp:479] Successfully updated 'registry'
> I0625 16:07:59.339436 26223 log.cpp:699] Attempting to truncate the log to 5
> I0625 16:07:59.339745 26223 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 6
> I0625 16:07:59.339993 26223 replica.cpp:508] Replica received write request 
> for position 6
> I0625 16:07:59.340585 26224 registrar.cpp:372] Successfully recovered 
> registrar
> I0625 16:07:59.341030 26228 master.cpp:980] Recovered 1 slaves from the 
> Registry (250B) ; allowing 10mins for slaves to re-register
> I0625 16:07:59.342396 26225 slave.cpp:584] New master detected at 
> [email protected]:47563
> I0625 16:07:59.342427 26225 slave.cpp:660] Authenticating with master 
> [email protected]:47563
> I0625 16:07:59.342473 26225 slave.cpp:633] Detecting new master
> I0625 16:07:59.342536 26225 sched.cpp:229] Scheduler::disconnected took 8596ns
> I0625 16:07:59.342546 26225 sched.cpp:235] New master detected at 
> [email protected]:47563
> I0625 16:07:59.342557 26225 sched.cpp:285] Authenticating with master 
> [email protected]:47563
> I0625 16:07:59.342622 26225 status_update_manager.cpp:167] New master 
> detected at [email protected]:47563
> I0625 16:07:59.342663 26225 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0625 16:07:59.342738 26225 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0625 16:07:59.342778 26225 master.cpp:843] Dropping 
> 'mesos.internal.AuthenticateMessage' message since not recovered yet
> I0625 16:07:59.342794 26225 master.cpp:843] Dropping 
> 'mesos.internal.AuthenticateMessage' message since not recovered yet
> I0625 16:07:59.353075 26223 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 13.023984ms
> I0625 16:07:59.353137 26223 replica.cpp:676] Persisted action at 6
> I0625 16:07:59.353423 26223 replica.cpp:655] Replica received learned notice 
> for position 6
> I0625 16:07:59.364477 26223 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 11.017392ms
> I0625 16:07:59.364581 26223 leveldb.cpp:401] Deleting ~2 keys from leveldb 
> took 48053ns
> I0625 16:07:59.364598 26223 replica.cpp:676] Persisted action at 6
> I0625 16:07:59.364608 26223 replica.cpp:661] Replica learned TRUNCATE action 
> at position 6
> 2014-06-25 
> 16:08:00,021:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-06-25 
> 16:08:03,358:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-06-25 
> 16:08:06,695:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> tests/master_tests.cpp:1936: Failure
> Failed to wait 10secs for slaveReregisteredMessage
> I0625 16:08:09.357589 26203 master.cpp:619] Master terminating
> F0625 16:08:09.357723 26226 slave.cpp:571] Check failed: 
> !_master.isDiscarded() 
> *** Check failure stack trace: ***
>     @     0x2b44e959da2d  google::LogMessage::Fail()
>     @     0x2b44e95a292f  google::LogMessage::SendToLog()
>     @     0x2b44e959f90a  google::LogMessage::Flush()
>     @     0x2b44e959fd1d  google::LogMessageFatal::~LogMessageFatal()
>     @     0x2b44e90d82f0  mesos::internal::slave::Slave::detected()
>     @           0x4dca91  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x2b44e94b42da  process::ProcessManager::resume()
>     @     0x2b44e94b4bec  process::schedule()
>     @     0x2b44eb4ba9ca  start_thread
>     @     0x2b44eb7bb1cd  (unknown)
> make[3]: *** [check-local] Aborted
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to