Hi Eugenia,

What version of mesos are you running? I would suggest building and running
the latest trunk version.

Also, is your local zookeeper setup with proper ACLs? I would look at
zookeeper logs to see if it shows anything suspicious? Because, to me, it
looks like both the master and slave are having a hard time creating the
znodes.

@vinodkone


On Thu, Sep 13, 2012 at 3:33 PM, Eugenia Gabrielova <[email protected]>wrote:

> Dear Mesos Development Team,
>
> I'm running into an interesting failover bug when running Mesos on
> Zookeeper. It seems to replicate this known failover bug, which is open:
> https://issues.apache.org/**jira/browse/MESOS-246<https://issues.apache.org/jira/browse/MESOS-246>.
> Would you be willing to suggest a possible workaround or a fix I can make
> locally? I've tried a few configuration changes but am unable to find a fix.
>
> Please find below my output from just a local setup, running a local
> standalone Zookeeper + Mesos Master + Slave. I am running the latest
> Master, and Zookeeper 3.3.6 (Stable latest); operating system is CentOS
> 6.3. I do have Mesos running without issues on Zookeeper in one local
> development environment (RHEL 6.2) with an identical configuration, but in
> each other environment (one CentOS 6.3, other RHEL 6.2), it encounters
> similar ZOO_DEBUG -> ping 0ms behaviour on fresh installs. Please let me
> know if I can provide further configuration information.
>
> Sincerely,
> Eugenia
>
> Outputs:
> *Master
> *[...] > ./bin/mesos-master.sh --zk=zk://localhost:2181/znode
> I0913 22:19:44.869660 26895 main.cpp:115] Build: 2012-09-13 19:33:05 by
> root
> I0913 22:19:44.870185 26895 main.cpp:116] Starting Mesos master
> I0913 22:19:44.871429 26910 master.cpp:299] Master started on
> 10.180.4.184:5050
> I0913 22:19:44.871531 26910 master.cpp:314] Master ID:
> 201209132219-3087315978-5050-**26895
> W0913 22:19:44.874285 26910 master.cpp:77] No whitelist given. Advertising
> offers for all slaves
> 2012-09-13 22:19:44,874:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@658:
> Client environment:zookeeper.version=**zookeeper C client 3.3.4
> 2012-09-13 22:19:44,874:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@662:
> Client environment:host.name=science-**falcon
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@669:
> Client environment:os.name=Linux
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@670:
> Client environment:os.arch=2.6.32-**279.el6.x86_64
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@671:
> Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@679:
> Client environment:user.name=root
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@687:
> Client environment:user.home=/root
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@699:
> Client environment:user.dir=/usr/**local/mesos
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@**
> zookeeper_init@727: Initiating client connection, host=localhost:2181
> sessionTimeout=10000 watcher=0x7f04a35e0450 sessionId=0
> sessionPasswd=<null> context=0x849ea0 flags=0
> 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_DEBUG@**
> start_threads@152: starting threads...
> 2012-09-13 22:19:44,876:26895(**0x7f049fb1d700):ZOO_DEBUG@do_**io@279:
> started IO thread
> 2012-09-13 22:19:44,878:26895(**0x7f049f11c700):ZOO_DEBUG@do_**
> completion@326: started completion thread
> 2012-09-13 22:19:44,878:26895(**0x7f049fb1d700):ZOO_INFO@**
> check_events@1585: initiated connection to server [::1:2181]
> I0913 22:19:44.899587 26914 webui.cpp:61] Loading webui script at
> '/usr/local/mesos/src/webui/**master/webui.py'
> 2012-09-13 22:19:44,974:26895(**0x7f049fb1d700):ZOO_INFO@**
> check_events@1632: session establishment complete on server [::1:2181],
> sessionId=0x139c14eb1dc0005, negotiated timeout=10000
> 2012-09-13 22:19:44,974:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the
> state=ZOO_CONNECTED_STATE
> 2012-09-13 22:19:44,974:26895(**0x7f049f11c700):ZOO_DEBUG@**
> process_completions@1765: Calling a watcher for node [], type = -1
> event=ZOO_SESSION_EVENT
> I0913 22:19:44.975180 26910 detector.cpp:287] Master detector connected to
> ZooKeeper ...
> I0913 22:19:44.975307 26910 detector.cpp:316] Trying to create znode
> '/znode' in ZooKeeper
> 2012-09-13 22:19:44,975:26895(**0x7f04a1134700):ZOO_DEBUG@zoo_**
> acreate@2503: Sending request xid=0x50525c01 for path [/znode] to ::1:2181
> 2012-09-13 22:19:44,982:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1989: Queueing asynchronous response
> 2012-09-13 22:19:44,982:26895(**0x7f049f11c700):ZOO_DEBUG@**
> process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c01
> rc=-110
> Bottle server starting up (using WSGIRefServer())...
> Listening on http://0.0.0.0:8080/
> Use Ctrl-C to quit.
>
> 2012-09-13 22:19:48,313:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:19:51,662:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 11 ms
> 2012-09-13 22:19:54,988:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:19:58,325:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:01,662:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:04,999:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:08,336:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:11,674:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:15,013:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 3 ms
> 2012-09-13 22:20:18,348:26895(**0x7f049fb1d700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
>
> *Slave
> *[...] > ./bin/mesos-slave.sh --master=zk://localhost:2181/**znode
> "--resources=cpus:2;mem:1024"
> I0913 22:20:31.852876 26922 main.cpp:123] Creating "process" isolation
> module
> I0913 22:20:31.853536 26922 main.cpp:131] Build: 2012-09-13 19:33:05 by
> root
> I0913 22:20:31.853575 26922 main.cpp:132] Starting Mesos slave
> I0913 22:20:31.887912 26937 slave.cpp:172] Slave started on 1)@
> 10.180.4.184:60711
> I0913 22:20:31.887969 26937 slave.cpp:173] Slave resources: cpus=2;
> mem=1024
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@658:
> Client environment:zookeeper.version=**zookeeper C client 3.3.4
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@662:
> Client environment:host.name=science-**falcon
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@669:
> Client environment:os.name=Linux
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@670:
> Client environment:os.arch=2.6.32-**279.el6.x86_64
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@671:
> Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@679:
> Client environment:user.name=root
> 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@687:
> Client environment:user.home=/root
> 2012-09-13 22:20:31,891:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@699:
> Client environment:user.dir=/usr/**local/mesos
> 2012-09-13 22:20:31,891:26922(**0x7fb78eb13720):ZOO_INFO@**
> zookeeper_init@727: Initiating client connection, host=localhost:2181
> sessionTimeout=10000 watcher=0x7fb78def8450 sessionId=0
> sessionPasswd=<null> context=0x11616c0 flags=0
> 2012-09-13 22:20:31,891:26922(**0x7fb78eb13720):ZOO_DEBUG@**
> start_threads@152: starting threads...
> 2012-09-13 22:20:31,892:26922(**0x7fb78a435700):ZOO_DEBUG@do_**io@279:
> started IO thread
> 2012-09-13 22:20:31,893:26922(**0x7fb789a34700):ZOO_DEBUG@do_**
> completion@326: started completion thread
> 2012-09-13 22:20:31,893:26922(**0x7fb78a435700):ZOO_INFO@**
> check_events@1585: initiated connection to server [127.0.0.1:2181]
> I0913 22:20:31.960598 26941 webui.cpp:61] Loading webui script at
> '/usr/local/mesos/src/webui/**slave/webui.py'
> 2012-09-13 22:20:31,972:26922(**0x7fb78a435700):ZOO_INFO@**
> check_events@1632: session establishment complete on server [
> 127.0.0.1:2181], sessionId=0x139c14eb1dc0006, negotiated timeout=10000
> 2012-09-13 22:20:31,972:26922(**0x7fb78a435700):ZOO_DEBUG@**
> check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the
> state=ZOO_CONNECTED_STATE
> 2012-09-13 22:20:31,972:26922(**0x7fb789a34700):ZOO_DEBUG@**
> process_completions@1765: Calling a watcher for node [], type = -1
> event=ZOO_SESSION_EVENT
> I0913 22:20:31.972904 26937 detector.cpp:287] Master detector connected to
> ZooKeeper ...
> I0913 22:20:31.973017 26937 detector.cpp:316] Trying to create znode
> '/znode' in ZooKeeper
> 2012-09-13 22:20:31,973:26922(**0x7fb78ba4c700):ZOO_DEBUG@zoo_**
> acreate@2503: Sending request xid=0x50525c30 for path [/znode] to
> 127.0.0.1:2181
> 2012-09-13 22:20:31,980:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1989: Queueing asynchronous response
> 2012-09-13 22:20:31,980:26922(**0x7fb789a34700):ZOO_DEBUG@**
> process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c30
> rc=-110
> Bottle server starting up (using WSGIRefServer())...
> Listening on http://0.0.0.0:8081/
> Use Ctrl-C to quit.
>
> 2012-09-13 22:20:35,311:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:38,648:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:41,986:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:45,323:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:48,660:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:51,997:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:55,334:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:20:58,672:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 1 ms
> 2012-09-13 22:21:02,008:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:21:05,346:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:21:08,683:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
> 2012-09-13 22:21:12,020:26922(**0x7fb78a435700):ZOO_DEBUG@**
> zookeeper_process@1983: Got ping response in 0 ms
>
>
>

Reply via email to