[
https://issues.apache.org/jira/browse/MESOS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kone resolved MESOS-1264.
-------------------------------
Resolution: Fixed
commit 84184f9979604a855143979a40fb8dd6b8e73d41
Author: Adam B <[email protected]>
Date: Fri May 2 13:40:21 2014 -0700
Fixed slave authentication to prevent sending TASK_LOST on
disconnection.
Changed master to only remove a slave's frameworks and offers on
exited, not on reauthenticate. Still disconnects at the allocator in
both cases.
Review: https://reviews.apache.org/r/21017
> Slave authentication retries can trigger TASK_LOST for non-checkpointing
> frameworks.
> ------------------------------------------------------------------------------------
>
> Key: MESOS-1264
> URL: https://issues.apache.org/jira/browse/MESOS-1264
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.19.0
> Reporter: Benjamin Mahler
> Assignee: Adam B
> Fix For: 0.19.0
>
>
> Looks like there is a regression with slave authentication that is making the
> FaultToleranceTest.ReconcileIncompleteTasks flaky.
> The following is what appears to be happening in the test and seems to be a
> regression:
> 1. Slave re-detects leading Master.
> 2. Slave re-authenticates with Master.
> 3. Master sees slave as already activated, calls disconnect().
> 4. For non-checkpointing frameworks, this call to disconnect() assumes the
> slave has exited, and will send TASK_LOST.
> 5. In the case where the slave did not exit, the tasks are not lost.
> Either we can consider this case to be valid as TASK_LOST, similarly to when
> the slave connection closes in Master::exited(UPID), updating the test as
> necessary.
> Or we can try to avoid sending TASK_LOST for authentication retries.
> {noformat}
> [ RUN ] FaultToleranceTest.ReconcileIncompleteTasks
> Using temporary directory
> '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_NvDg7P'
> I0428 22:54:27.878968 533 leveldb.cpp:174] Opened db in 152.905109ms
> I0428 22:54:27.901624 533 leveldb.cpp:181] Compacted db in 22.554687ms
> I0428 22:54:27.901720 533 leveldb.cpp:196] Created db iterator in 5625ns
> I0428 22:54:27.901761 533 leveldb.cpp:202] Seeked to beginning of db in
> 930ns
> I0428 22:54:27.901922 533 leveldb.cpp:271] Iterated through 0 keys in the
> db in 363ns
> I0428 22:54:27.901958 533 replica.cpp:729] Replica recovered with log
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0428 22:54:27.902427 559 recover.cpp:425] Starting replica recovery
> I0428 22:54:27.902935 561 recover.cpp:451] Replica is in EMPTY status
> I0428 22:54:27.904425 556 master.cpp:266] Master
> 20140428-225427-1740121354-35964-533 (HOSTNAME) started on 10.37.184.103:35964
> I0428 22:54:27.904470 556 master.cpp:303] Master only allowing
> authenticated frameworks to register
> I0428 22:54:27.904491 556 master.cpp:308] Master only allowing
> authenticated slaves to register
> I0428 22:54:27.904515 556 credentials.hpp:35] Loading credentials for
> authentication
> I0428 22:54:27.904561 551 replica.cpp:626] Replica in EMPTY status received
> a broadcasted recover request
> W0428 22:54:27.904619 556 credentials.hpp:48] Failed to stat credentials
> file
> 'file:///tmp/FaultToleranceTest_ReconcileIncompleteTasks_NvDg7P/credentials':
> No such file or directory
> I0428 22:54:27.904887 550 recover.cpp:188] Received a recover response from
> a replica in EMPTY status
> I0428 22:54:27.905383 568 recover.cpp:542] Updating replica status to
> STARTING
> I0428 22:54:27.906673 558 master.cpp:922] The newly elected leader is
> [email protected]:35964 with id 20140428-225427-1740121354-35964-533
> I0428 22:54:27.906708 558 master.cpp:932] Elected as the leading master!
> I0428 22:54:27.906728 558 master.cpp:753] Recovering from registrar
> I0428 22:54:27.906842 556 registrar.cpp:275] Recovering registrar
> I0428 22:54:27.937743 561 leveldb.cpp:304] Persisting metadata (8 bytes) to
> leveldb took 31.905754ms
> I0428 22:54:27.937796 561 replica.cpp:320] Persisted replica status to
> STARTING
> I0428 22:54:27.938019 561 recover.cpp:451] Replica is in STARTING status
> I0428 22:54:27.939944 569 replica.cpp:626] Replica in STARTING status
> received a broadcasted recover request
> I0428 22:54:27.940160 565 recover.cpp:188] Received a recover response from
> a replica in STARTING status
> I0428 22:54:27.941345 558 recover.cpp:542] Updating replica status to VOTING
> I0428 22:54:27.962074 553 leveldb.cpp:304] Persisting metadata (8 bytes) to
> leveldb took 20.298725ms
> I0428 22:54:27.962128 553 replica.cpp:320] Persisted replica status to
> VOTING
> I0428 22:54:27.962266 553 recover.cpp:556] Successfully joined the Paxos
> group
> I0428 22:54:27.962432 553 recover.cpp:440] Recover process terminated
> I0428 22:54:27.962946 566 log.cpp:656] Attempting to start the writer
> I0428 22:54:27.964815 547 replica.cpp:474] Replica received implicit
> promise request with proposal 1
> I0428 22:54:27.970388 547 leveldb.cpp:304] Persisting metadata (8 bytes) to
> leveldb took 5.516929ms
> I0428 22:54:27.970440 547 replica.cpp:342] Persisted promised to 1
> I0428 22:54:27.971122 569 coordinator.cpp:229] Coordinator attemping to
> fill missing position
> I0428 22:54:27.972815 570 replica.cpp:375] Replica received explicit
> promise request for position 0 with proposal 2
> I0428 22:54:27.978729 570 leveldb.cpp:341] Persisting action (8 bytes) to
> leveldb took 5.798681ms
> I0428 22:54:27.978806 570 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.980425 547 replica.cpp:508] Replica received write request
> for position 0
> I0428 22:54:27.980496 547 leveldb.cpp:436] Reading position from leveldb
> took 21889ns
> I0428 22:54:27.987072 547 leveldb.cpp:341] Persisting action (14 bytes) to
> leveldb took 6.529569ms
> I0428 22:54:27.987125 547 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.987746 559 replica.cpp:643] Replica received learned notice
> for position 0
> I0428 22:54:27.995412 559 leveldb.cpp:341] Persisting action (16 bytes) to
> leveldb took 7.515296ms
> I0428 22:54:27.995465 559 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.995499 559 replica.cpp:649] Replica learned NOP action at
> position 0
> I0428 22:54:27.996111 560 log.cpp:672] Writer started with ending position 0
> I0428 22:54:27.997576 562 leveldb.cpp:436] Reading position from leveldb
> took 23701ns
> I0428 22:54:28.000907 550 registrar.cpp:308] Successfully recovered
> registrar
> I0428 22:54:28.001000 550 registrar.cpp:379] Attempting to update the
> 'registry'
> I0428 22:54:28.003427 567 log.cpp:680] Attempting to append 153 bytes to
> the log
> I0428 22:54:28.003566 548 coordinator.cpp:339] Coordinator attempting to
> write APPEND action at position 1
> I0428 22:54:28.004590 552 replica.cpp:508] Replica received write request
> for position 1
> I0428 22:54:28.079499 552 leveldb.cpp:341] Persisting action (172 bytes) to
> leveldb took 74.874271ms
> I0428 22:54:28.079545 552 replica.cpp:664] Persisted action at 1
> I0428 22:54:28.080243 548 replica.cpp:643] Replica received learned notice
> for position 1
> I0428 22:54:28.103667 548 leveldb.cpp:341] Persisting action (174 bytes) to
> leveldb took 23.258596ms
> I0428 22:54:28.103731 548 replica.cpp:664] Persisted action at 1
> I0428 22:54:28.103847 548 replica.cpp:649] Replica learned APPEND action at
> position 1
> I0428 22:54:28.105062 568 registrar.cpp:427] Successfully updated 'registry'
> I0428 22:54:28.105399 561 log.cpp:699] Attempting to truncate the log to 1
> I0428 22:54:28.105626 559 master.cpp:780] Recovered 0 slaves from the
> Registry (115B) ; allowing 10mins for slaves to re-register
> I0428 22:54:28.105702 547 coordinator.cpp:339] Coordinator attempting to
> write TRUNCATE action at position 2
> I0428 22:54:28.107192 548 replica.cpp:508] Replica received write request
> for position 2
> I0428 22:54:28.110117 547 slave.cpp:140] Slave started on
> 4)@10.37.184.103:35964
> I0428 22:54:28.110178 547 slave.cpp:149] Moving slave process into its own
> cgroup
> I0428 22:54:28.111990 548 leveldb.cpp:341] Persisting action (16 bytes) to
> leveldb took 4.755207ms
> I0428 22:54:28.112031 548 replica.cpp:664] Persisted action at 2
> I0428 22:54:28.112576 558 replica.cpp:643] Replica received learned notice
> for position 2
> I0428 22:54:28.113647 533 sched.cpp:121] Version: 0.19.0
> I0428 22:54:28.114310 549 sched.cpp:217] New master detected at
> [email protected]:35964
> I0428 22:54:28.114351 549 sched.cpp:268] Authenticating with master
> [email protected]:35964
> I0428 22:54:28.114554 561 authenticatee.hpp:128] Creating new client SASL
> connection
> I0428 22:54:28.114681 549 master.cpp:2795] Authenticating
> scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.114882 570 authenticator.hpp:148] Creating new server SASL
> connection
> I0428 22:54:28.115015 570 authenticatee.hpp:219] Received SASL
> authentication mechanisms: CRAM-MD5
> I0428 22:54:28.115047 570 authenticatee.hpp:245] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I0428 22:54:28.115114 570 authenticator.hpp:254] Received SASL
> authentication start
> I0428 22:54:28.115195 570 authenticator.hpp:342] Authentication requires
> more steps
> I0428 22:54:28.115279 548 authenticatee.hpp:265] Received SASL
> authentication step
> I0428 22:54:28.115485 556 authenticator.hpp:282] Received SASL
> authentication step
> I0428 22:54:28.115576 556 authenticator.hpp:334] Authentication success
> I0428 22:54:28.115681 555 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.115736 549 master.cpp:2835] Successfully authenticated
> scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116276 566 sched.cpp:342] Successfully authenticated with
> master [email protected]:35964
> I0428 22:54:28.116435 561 master.cpp:981] Received registration request
> from scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116557 561 master.cpp:999] Registering framework
> 20140428-225427-1740121354-35964-533-0000 at scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116726 552 sched.cpp:392] Framework registered with
> 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.116801 561 hierarchical_allocator_process.hpp:332] Added
> framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.120318 558 leveldb.cpp:341] Persisting action (18 bytes) to
> leveldb took 7.6976ms
> I0428 22:54:28.120383 558 leveldb.cpp:399] Deleting ~1 keys from leveldb
> took 18353ns
> I0428 22:54:28.120409 558 replica.cpp:664] Persisted action at 2
> I0428 22:54:28.120434 558 replica.cpp:649] Replica learned TRUNCATE action
> at position 2
> I0428 22:54:28.135422 547 slave.cpp:149] Moving slave process into its own
> cgroup
> I0428 22:54:28.155840 547 credentials.hpp:35] Loading credentials for
> authentication
> W0428 22:54:28.155907 547 credentials.hpp:48] Failed to stat credentials
> file
> 'file:///tmp/FaultToleranceTest_ReconcileIncompleteTasks_aPd9kr/credential':
> No such file or directory
> I0428 22:54:28.155962 547 slave.cpp:231] Slave using credential for:
> test-principal
> I0428 22:54:28.156137 547 slave.cpp:244] Slave resources: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0428 22:54:28.156743 547 slave.cpp:272] Slave hostname: HOSTNAME
> I0428 22:54:28.156797 547 slave.cpp:273] Slave checkpoint: false
> I0428 22:54:28.157765 567 state.cpp:33] Recovering state from
> '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_aPd9kr/meta'
> I0428 22:54:28.158015 564 status_update_manager.cpp:193] Recovering status
> update manager
> I0428 22:54:28.158342 548 slave.cpp:2943] Finished recovery
> I0428 22:54:28.158860 566 slave.cpp:525] New master detected at
> [email protected]:35964
> I0428 22:54:28.159013 566 slave.cpp:585] Authenticating with master
> [email protected]:35964
> I0428 22:54:28.159083 548 status_update_manager.cpp:167] New master
> detected at [email protected]:35964
> I0428 22:54:28.159139 566 slave.cpp:558] Detecting new master
> I0428 22:54:28.159340 567 authenticatee.hpp:128] Creating new client SASL
> connection
> I0428 22:54:28.159664 567 master.cpp:2795] Authenticating
> slave(4)@10.37.184.103:35964
> I0428 22:54:28.159989 559 authenticator.hpp:148] Creating new server SASL
> connection
> I0428 22:54:28.160220 548 authenticatee.hpp:219] Received SASL
> authentication mechanisms: CRAM-MD5
> I0428 22:54:28.160253 548 authenticatee.hpp:245] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I0428 22:54:28.160328 548 authenticator.hpp:254] Received SASL
> authentication start
> I0428 22:54:28.160426 548 authenticator.hpp:342] Authentication requires
> more steps
> I0428 22:54:28.160490 548 authenticatee.hpp:265] Received SASL
> authentication step
> I0428 22:54:28.160568 548 authenticator.hpp:282] Received SASL
> authentication step
> I0428 22:54:28.160673 548 authenticator.hpp:334] Authentication success
> I0428 22:54:28.160758 558 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.160787 548 master.cpp:2835] Successfully authenticated
> slave(4)@10.37.184.103:35964
> I0428 22:54:28.161053 570 slave.cpp:642] Successfully authenticated with
> master [email protected]:35964
> I0428 22:54:28.161638 556 registrar.cpp:379] Attempting to update the
> 'registry'
> I0428 22:54:28.164188 570 log.cpp:680] Attempting to append 378 bytes to
> the log
> I0428 22:54:28.164309 558 coordinator.cpp:339] Coordinator attempting to
> write APPEND action at position 3
> I0428 22:54:28.165631 569 replica.cpp:508] Replica received write request
> for position 3
> I0428 22:54:28.186933 569 leveldb.cpp:341] Persisting action (397 bytes) to
> leveldb took 21.263363ms
> I0428 22:54:28.186987 569 replica.cpp:664] Persisted action at 3
> I0428 22:54:28.187626 555 replica.cpp:643] Replica received learned notice
> for position 3
> I0428 22:54:28.195307 555 leveldb.cpp:341] Persisting action (399 bytes) to
> leveldb took 7.628475ms
> I0428 22:54:28.195361 555 replica.cpp:664] Persisted action at 3
> I0428 22:54:28.195397 555 replica.cpp:649] Replica learned APPEND action at
> position 3
> I0428 22:54:28.196445 547 registrar.cpp:427] Successfully updated 'registry'
> I0428 22:54:28.196738 553 log.cpp:699] Attempting to truncate the log to 3
> I0428 22:54:28.197031 552 master.cpp:2169] Admitted slave on HOSTNAME at
> slave(4)@IP:35964
> I0428 22:54:28.197108 552 master.cpp:3283] Adding slave
> 20140428-225427-1740121354-35964-533-0 at HOST with cpus(*):2; mem(*):1024;
> disk(*):1024; ports(*):[31000-32000]
> I0428 22:54:28.196979 568 coordinator.cpp:339] Coordinator attempting to
> write TRUNCATE action at position 4
> I0428 22:54:28.197506 560 slave.cpp:675] Registered with master
> [email protected]:35964; given slave ID
> 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.197764 547 hierarchical_allocator_process.hpp:445] Added
> slave 20140428-225427-1740121354-35964-533-0 (HOST) with cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0428 22:54:28.198402 565 master.cpp:2744] Sending 1 offers to framework
> 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.199038 568 replica.cpp:508] Replica received write request
> for position 4
> I0428 22:54:28.200683 548 master.cpp:1806] Processing reply for offers: [
> 20140428-225427-1740121354-35964-533-0 ] on slave
> 20140428-225427-1740121354-35964-533-0 (HOST) for framework
> 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.201052 548 master.hpp:558] Adding task 1 with resources
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
> 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.201151 548 master.cpp:2919] Launching task 1 of framework
> 20140428-225427-1740121354-35964-533-0000 with resources cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
> 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.201370 555 slave.cpp:905] Got assigned task 1 for framework
> 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.201853 555 slave.cpp:1015] Launching task 1 for framework
> 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.203640 568 leveldb.cpp:341] Persisting action (16 bytes) to
> leveldb took 4.423216ms
> I0428 22:54:28.203698 568 replica.cpp:664] Persisted action at 4
> I0428 22:54:28.204627 547 replica.cpp:643] Replica received learned notice
> for position 4
> I0428 22:54:28.206099 555 exec.cpp:131] Version: 0.19.0
> I0428 22:54:28.206481 555 slave.cpp:1125] Queuing task '1' for executor
> default of framework '20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.206670 555 slave.cpp:2282] Monitoring executor 'default' of
> framework '20140428-225427-1740121354-35964-533-0000' in container
> '77e0cf10-1c8d-47a7-946d-eb56f26e339d'
> I0428 22:54:28.206925 555 slave.cpp:1598] Got registration for executor
> 'default' of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.207355 555 slave.cpp:1717] Flushing queued task 1 for
> executor 'default' of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.207478 557 exec.cpp:205] Executor registered on slave
> 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.211570 557 slave.cpp:1953] Handling status update
> TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of
> framework 20140428-225427-1740121354-35964-533-0000 from
> executor(4)@10.37.184.103:35964
> I0428 22:54:28.211958 547 leveldb.cpp:341] Persisting action (18 bytes) to
> leveldb took 7.29439ms
> I0428 22:54:28.211987 559 status_update_manager.cpp:320] Received status
> update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1
> of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.212195 547 leveldb.cpp:399] Deleting ~2 keys from leveldb
> took 56641ns
> I0428 22:54:28.212311 547 replica.cpp:664] Persisted action at 4
> I0428 22:54:28.212358 547 replica.cpp:649] Replica learned TRUNCATE action
> at position 4
> I0428 22:54:28.212357 559 status_update_manager.cpp:373] Forwarding status
> update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1
> of framework 20140428-225427-1740121354-35964-533-0000 to
> [email protected]:35964
> I0428 22:54:28.212858 559 slave.cpp:2076] Sending acknowledgement for
> status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for
> task 1 of framework 20140428-225427-1740121354-35964-533-0000 to
> executor(4)@10.37.184.103:35964
> I0428 22:54:28.222946 565 slave.cpp:525] New master detected at
> [email protected]:35964
> I0428 22:54:28.223008 565 slave.cpp:585] Authenticating with master
> [email protected]:35964
> I0428 22:54:28.223079 564 status_update_manager.cpp:167] New master
> detected at [email protected]:35964
> I0428 22:54:28.223125 565 slave.cpp:558] Detecting new master
> W0428 22:54:28.223120 564 status_update_manager.cpp:181] Resending status
> update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1
> of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.223142 568 authenticatee.hpp:128] Creating new client SASL
> connection
> I0428 22:54:28.223157 564 status_update_manager.cpp:373] Forwarding status
> update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1
> of framework 20140428-225427-1740121354-35964-533-0000 to
> [email protected]:35964
> I0428 22:54:28.223357 565 master.cpp:1263] Disconnecting slave
> 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.223460 566 hierarchical_allocator_process.hpp:484] Slave
> 20140428-225427-1740121354-35964-533-0 disconnected
> I0428 22:54:28.223486 565 master.cpp:1283] Removing non-checkpointing
> framework 20140428-225427-1740121354-35964-533-0000 from disconnected slave
> 20140428-225427-1740121354-35964-533-0(HOST)
> I0428 22:54:28.223505 565 master.cpp:3235] Removing framework
> 20140428-225427-1740121354-35964-533-0000 from slave
> 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.225428 565 master.cpp:2444] Status update TASK_LOST (UUID:
> dfa02d31-0a93-4808-924c-0969412c7e52) for task 1 of framework
> 20140428-225427-1740121354-35964-533-0000 from @0.0.0.0:0
> I0428 22:54:28.225581 565 master.hpp:576] Removing task 1 with resources
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
> 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.225934 565 master.cpp:2795] Authenticating
> slave(4)@10.37.184.103:35964
> I0428 22:54:28.225975 564 hierarchical_allocator_process.hpp:637] Recovered
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total
> allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on
> slave 20140428-225427-1740121354-35964-533-0 from framework
> 20140428-225427-1740121354-35964-533-0000
> W0428 22:54:28.226312 565 master.cpp:2437] Status update TASK_FINISHED
> (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework
> 20140428-225427-1740121354-35964-533-0000 from slave(4)@10.37.184.103:35964
> (HOST): error, couldn't lookup task
> I0428 22:54:28.226378 563 authenticator.hpp:148] Creating new server SASL
> connection
> I0428 22:54:28.226554 554 authenticatee.hpp:219] Received SASL
> authentication mechanisms: CRAM-MD5
> I0428 22:54:28.226663 554 authenticatee.hpp:245] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I0428 22:54:28.226667 553 status_update_manager.cpp:398] Received status
> update acknowledgement (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task
> 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.226872 552 authenticator.hpp:254] Received SASL
> authentication start
> I0428 22:54:28.226994 552 authenticator.hpp:342] Authentication requires
> more steps
> I0428 22:54:28.227147 556 authenticatee.hpp:265] Received SASL
> authentication step
> I0428 22:54:28.227288 556 authenticator.hpp:282] Received SASL
> authentication step
> I0428 22:54:28.227378 556 authenticator.hpp:334] Authentication success
> I0428 22:54:28.227519 548 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.227524 556 master.cpp:2835] Successfully authenticated
> slave(4)@10.37.184.103:35964
> I0428 22:54:28.227975 556 slave.cpp:642] Successfully authenticated with
> master [email protected]:35964
> W0428 22:54:28.228243 568 master.cpp:2244] Slave at
> slave(4)@10.37.184.103:35964 (HOST) is being allowed to re-register with an
> already in use id (20140428-225427-1740121354-35964-533-0)
> I0428 22:54:28.228421 556 slave.cpp:725] Re-registered with master
> [email protected]:35964
> I0428 22:54:28.228457 553 hierarchical_allocator_process.hpp:498] Slave
> 20140428-225427-1740121354-35964-533-0 reconnected
> ../../src/tests/fault_tolerance_tests.cpp:2084: Failure
> Value of: status.get().state()
> Actual: TASK_LOST
> Expected: TASK_FINISHED
> /var/tmp/sclly9xeu: line 8: 533 Segmentation fault
> './bin/mesos-tests.sh'
> '--gtest_filter=FaultToleranceTest.ReconcileIncompleteTasks'
> '--gtest_repeat=-1' '--gtest_break_on_failure' '--verbose'
> [bmahler@HOST build]$ Write failed: Broken pipe
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)