[ 
https://issues.apache.org/jira/browse/MESOS-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749929#comment-13749929
 ] 

Tomas Barton commented on MESOS-653:
------------------------------------

This seems to be the most relevant part:

{code}
I0821 11:48:08.839939 11199 master.cpp:1029] Attempting to register slave on 
worker1.localnet at slave(1)@127.0.0.1:5051
I0821 11:48:08.839978 11199 master.cpp:2022] Adding slave 
201308191529-2380865683-5050-11189-4621 at worker1.localnet with cpus(*):4; 
mem(*):6866; disk(*):10288; ports(*):[31000-32000]
I0821 11:48:08.840117 11198 hierarchical_allocator_process.hpp:430] Added slave 
201308191529-2380865683-5050-11189-4621 (worker1.localnet) with cpus(*):4; 
mem(*):6866; disk(*):10288; ports(*):[31000-32000] (and cpus(*):4; mem(*):6866; 
disk(*):10288; ports(*):[31000-32000] available)
I0821 11:48:08.939265 11199 master.cpp:1029] Attempting to register slave on 
lab.localnet at slave(1)@127.0.1.1:5051
I0821 11:48:08.939292 11199 master.cpp:2022] Adding slave 
201308191529-2380865683-5050-11189-4622 at lab.localnet with cpus(*):4; 
mem(*):988; disk(*):38812; ports(*):[31000-32000]
I0821 11:48:08.939414 11199 hierarchical_allocator_process.hpp:430] Added slave 
201308191529-2380865683-5050-11189-4622 (lab.localnet) with cpus(*):4; 
mem(*):988; disk(*):38812; ports(*):[31000-32000] (and cpus(*):4; mem(*):988; 
disk(*):38812; ports(*):[31000-32000] available)
I0821 11:48:08.940052 11198 master.cpp:521] Slave 
201308191529-2380865683-5050-11189-4622 (lab.localnet) disconnected
I0821 11:48:08.940073 11198 master.cpp:526] Removing disconnected slave 
201308191529-2380865683-5050-11189-4622(lab.localnet) because it is not 
checkpointing!
I0821 11:48:08.940106 11198 master.cpp:521] Slave 
201308191529-2380865683-5050-11189-4621 (worker1.localnet) disconnected
I0821 11:48:08.940119 11198 master.cpp:526] Removing disconnected slave 
201308191529-2380865683-5050-11189-4621(worker1.localnet) because it is not 
checkpointing!
I0821 11:48:08.940124 11200 hierarchical_allocator_process.hpp:455] Removed 
slave 201308191529-2380865683-5050-11189-4622
I0821 11:48:08.940163 11200 hierarchical_allocator_process.hpp:455] Removed 
slave 201308191529-2380865683-5050-11189-4621
I0821 11:48:09.003453 11198 http.cpp:349] HTTP request for '/master/state.json'
W0821 11:48:09.660881 11201 master.cpp:83] No whitelist given. Advertising 
offers for all slaves
{code}

No, checkpointing wasn't enabled. There are two slaves which can't connect to 
the master.
                
> Slave gets wrong id and commits suicide
> ---------------------------------------
>
>                 Key: MESOS-653
>                 URL: https://issues.apache.org/jira/browse/MESOS-653
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.14.0
>         Environment: Debian 6, mesos git rev: 853c9ba
>            Reporter: Tomas Barton
>              Labels: slave
>
> After restart slave can't re-register and crashes.
> I0821 11:48:08.825433 24723 slave.cpp:113] Slave started on 1)@127.0.0.1:5051
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@658: Client 
> environment:zookeeper.version=zookeeper C client 3.3.4
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@662: Client 
> environment:host.name=192.168.1.1
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@669: Client 
> environment:os.name=Linux
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@670: Client 
> environment:os.arch=2.6.32-5-amd64
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@671: Client 
> environment:os.version=#1 SMP Fri May 10 08:43:19 UTC 2013
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@679: Client 
> environment:user.name=(null)
> I0821 11:48:08.825728 24723 slave.cpp:213] Slave resources: cpus(*):4; 
> mem(*):6866; disk(*):10288; ports(*):[31000-32000]
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@687: Client 
> environment:user.home=/root
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@log_env@699: Client 
> environment:user.dir=/
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_INFO@zookeeper_init@727: 
> Initiating client connection, host=192.168.1.1:2181 sessionTimeout=10000 
> watcher=0x7f18e0086aa0 sessionId=0 sessionPasswd=<null> 
> context=0x7f18d4004d50 flags=0
> 2013-08-21 11:48:08,825:24712(0x7f18da82a700):ZOO_DEBUG@start_threads@152: 
> starting threads...
> 2013-08-21 11:48:08,826:24712(0x7f18d3df4700):ZOO_DEBUG@do_io@279: started IO 
> thread
> 2013-08-21 11:48:08,826:24712(0x7f18d35f3700):ZOO_DEBUG@do_completion@326: 
> started completion thread
> 2013-08-21 11:48:08,826:24712(0x7f18d3df4700):ZOO_INFO@check_events@1585: 
> initiated connection to server [192.168.1.1:2181]
> I0821 11:48:08.826833 24722 process_isolator.cpp:315] Recovering isolator
> I0821 11:48:08.826894 24724 slave.cpp:403] Finished recovery
> I0821 11:48:08.826939 24724 slave.cpp:423] Garbage collecting old slave 
> 201308151322-2380865683-5050-29275-0
> I0821 11:48:08.826990 24723 gc.cpp:56] Scheduling 
> '/tmp/mesos/slaves/201308151322-2380865683-5050-29275-0' for removal
> I0821 11:48:08.827003 24724 slave.cpp:423] Garbage collecting old slave 
> 201308191529-2380865683-5050-11189-0
> I0821 11:48:08.827046 24723 gc.cpp:56] Scheduling 
> '/tmp/mesos/slaves/201308191529-2380865683-5050-11189-0' for removal
> 2013-08-21 11:48:08,838:24712(0x7f18d3df4700):ZOO_INFO@check_events@1632: 
> session establishment complete on server [192.168.1.1:2181], 
> sessionId=0x14076ce74240387, negotiated timeout=10000
> 2013-08-21 11:48:08,838:24712(0x7f18d3df4700):ZOO_DEBUG@check_events@1638: 
> Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
> 2013-08-21 
> 11:48:08,838:24712(0x7f18d35f3700):ZOO_DEBUG@process_completions@1765: 
> Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
> I0821 11:48:08.838765 24723 detector.cpp:234] Master detector 
> (slave(1)@127.0.0.1:5051) connected to ZooKeeper ...
> I0821 11:48:08.838783 24723 detector.cpp:251] Trying to create path '/mesos' 
> in ZooKeeper
> 2013-08-21 11:48:08,838:24712(0x7f18da029700):ZOO_DEBUG@zoo_awexists@2587: 
> Sending request xid=0x52148cd9 for path [/mesos] to 192.168.1.1:2181
> 2013-08-21 
> 11:48:08,839:24712(0x7f18d3df4700):ZOO_DEBUG@zookeeper_process@1989: Queueing 
> asynchronous response
> 2013-08-21 
> 11:48:08,839:24712(0x7f18d35f3700):ZOO_DEBUG@process_completions@1784: 
> Calling COMPLETION_STAT for xid=0x52148cd9 rc=0
> 2013-08-21 
> 11:48:08,839:24712(0x7f18da029700):ZOO_DEBUG@zoo_awget_children_@2626: 
> Sending request xid=0x52148cda for path [/mesos] to 192.168.1.1:2181
> 2013-08-21 
> 11:48:08,839:24712(0x7f18d3df4700):ZOO_DEBUG@zookeeper_process@1989: Queueing 
> asynchronous response
> 2013-08-21 
> 11:48:08,839:24712(0x7f18d35f3700):ZOO_DEBUG@process_completions@1795: 
> Calling COMPLETION_STRINGLIST for xid=0x52148cda rc=0
> I0821 11:48:08.839437 24723 detector.cpp:420] Master detector 
> (slave(1)@127.0.0.1:5051)  found 1 registered masters
> 2013-08-21 11:48:08,839:24712(0x7f18da029700):ZOO_DEBUG@zoo_awget@2414: 
> Sending request xid=0x52148cdb for path [/mesos/0000000006] to 
> 192.168.1.1:2181
> 2013-08-21 
> 11:48:08,839:24712(0x7f18d3df4700):ZOO_DEBUG@zookeeper_process@1989: Queueing 
> asynchronous response
> 2013-08-21 
> 11:48:08,839:24712(0x7f18d35f3700):ZOO_DEBUG@process_completions@1772: 
> Calling COMPLETION_DATA for xid=0x52148cdb rc=0
> I0821 11:48:08.839607 24723 detector.cpp:467] Master detector 
> (slave(1)@127.0.0.1:5051)  got new master pid: [email protected]:5050
> I0821 11:48:08.839712 24724 slave.cpp:542] New master detected at 
> [email protected]:5050
> I0821 11:48:08.839778 24723 status_update_manager.cpp:157] New master 
> detected at [email protected]:5050
> I0821 11:48:08.840149 24721 slave.cpp:602] Registered with master 
> [email protected]:5050; given slave ID 
> 201308191529-2380865683-5050-11189-4621
> Registered but got wrong id: 
> 201308191529-2380865683-5050-11189-4622(expected: 
> 201308191529-2380865683-5050-11189-4621). Committing suicide

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to