Hi Eugenia, What version of mesos are you running? I would suggest building and running the latest trunk version.
Also, is your local zookeeper setup with proper ACLs? I would look at zookeeper logs to see if it shows anything suspicious? Because, to me, it looks like both the master and slave are having a hard time creating the znodes. @vinodkone On Thu, Sep 13, 2012 at 3:33 PM, Eugenia Gabrielova <[email protected]>wrote: > Dear Mesos Development Team, > > I'm running into an interesting failover bug when running Mesos on > Zookeeper. It seems to replicate this known failover bug, which is open: > https://issues.apache.org/**jira/browse/MESOS-246<https://issues.apache.org/jira/browse/MESOS-246>. > Would you be willing to suggest a possible workaround or a fix I can make > locally? I've tried a few configuration changes but am unable to find a fix. > > Please find below my output from just a local setup, running a local > standalone Zookeeper + Mesos Master + Slave. I am running the latest > Master, and Zookeeper 3.3.6 (Stable latest); operating system is CentOS > 6.3. I do have Mesos running without issues on Zookeeper in one local > development environment (RHEL 6.2) with an identical configuration, but in > each other environment (one CentOS 6.3, other RHEL 6.2), it encounters > similar ZOO_DEBUG -> ping 0ms behaviour on fresh installs. Please let me > know if I can provide further configuration information. > > Sincerely, > Eugenia > > Outputs: > *Master > *[...] > ./bin/mesos-master.sh --zk=zk://localhost:2181/znode > I0913 22:19:44.869660 26895 main.cpp:115] Build: 2012-09-13 19:33:05 by > root > I0913 22:19:44.870185 26895 main.cpp:116] Starting Mesos master > I0913 22:19:44.871429 26910 master.cpp:299] Master started on > 10.180.4.184:5050 > I0913 22:19:44.871531 26910 master.cpp:314] Master ID: > 201209132219-3087315978-5050-**26895 > W0913 22:19:44.874285 26910 master.cpp:77] No whitelist given. Advertising > offers for all slaves > 2012-09-13 22:19:44,874:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@658: > Client environment:zookeeper.version=**zookeeper C client 3.3.4 > 2012-09-13 22:19:44,874:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@662: > Client environment:host.name=science-**falcon > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@669: > Client environment:os.name=Linux > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@670: > Client environment:os.arch=2.6.32-**279.el6.x86_64 > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@671: > Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012 > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@679: > Client environment:user.name=root > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@687: > Client environment:user.home=/root > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@log_**env@699: > Client environment:user.dir=/usr/**local/mesos > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_INFO@** > zookeeper_init@727: Initiating client connection, host=localhost:2181 > sessionTimeout=10000 watcher=0x7f04a35e0450 sessionId=0 > sessionPasswd=<null> context=0x849ea0 flags=0 > 2012-09-13 22:19:44,875:26895(**0x7f04a41fb720):ZOO_DEBUG@** > start_threads@152: starting threads... > 2012-09-13 22:19:44,876:26895(**0x7f049fb1d700):ZOO_DEBUG@do_**io@279: > started IO thread > 2012-09-13 22:19:44,878:26895(**0x7f049f11c700):ZOO_DEBUG@do_** > completion@326: started completion thread > 2012-09-13 22:19:44,878:26895(**0x7f049fb1d700):ZOO_INFO@** > check_events@1585: initiated connection to server [::1:2181] > I0913 22:19:44.899587 26914 webui.cpp:61] Loading webui script at > '/usr/local/mesos/src/webui/**master/webui.py' > 2012-09-13 22:19:44,974:26895(**0x7f049fb1d700):ZOO_INFO@** > check_events@1632: session establishment complete on server [::1:2181], > sessionId=0x139c14eb1dc0005, negotiated timeout=10000 > 2012-09-13 22:19:44,974:26895(**0x7f049fb1d700):ZOO_DEBUG@** > check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the > state=ZOO_CONNECTED_STATE > 2012-09-13 22:19:44,974:26895(**0x7f049f11c700):ZOO_DEBUG@** > process_completions@1765: Calling a watcher for node [], type = -1 > event=ZOO_SESSION_EVENT > I0913 22:19:44.975180 26910 detector.cpp:287] Master detector connected to > ZooKeeper ... > I0913 22:19:44.975307 26910 detector.cpp:316] Trying to create znode > '/znode' in ZooKeeper > 2012-09-13 22:19:44,975:26895(**0x7f04a1134700):ZOO_DEBUG@zoo_** > acreate@2503: Sending request xid=0x50525c01 for path [/znode] to ::1:2181 > 2012-09-13 22:19:44,982:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1989: Queueing asynchronous response > 2012-09-13 22:19:44,982:26895(**0x7f049f11c700):ZOO_DEBUG@** > process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c01 > rc=-110 > Bottle server starting up (using WSGIRefServer())... > Listening on http://0.0.0.0:8080/ > Use Ctrl-C to quit. > > 2012-09-13 22:19:48,313:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:19:51,662:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 11 ms > 2012-09-13 22:19:54,988:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:19:58,325:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:01,662:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:04,999:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:08,336:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:11,674:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:15,013:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 3 ms > 2012-09-13 22:20:18,348:26895(**0x7f049fb1d700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > > *Slave > *[...] > ./bin/mesos-slave.sh --master=zk://localhost:2181/**znode > "--resources=cpus:2;mem:1024" > I0913 22:20:31.852876 26922 main.cpp:123] Creating "process" isolation > module > I0913 22:20:31.853536 26922 main.cpp:131] Build: 2012-09-13 19:33:05 by > root > I0913 22:20:31.853575 26922 main.cpp:132] Starting Mesos slave > I0913 22:20:31.887912 26937 slave.cpp:172] Slave started on 1)@ > 10.180.4.184:60711 > I0913 22:20:31.887969 26937 slave.cpp:173] Slave resources: cpus=2; > mem=1024 > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@658: > Client environment:zookeeper.version=**zookeeper C client 3.3.4 > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@662: > Client environment:host.name=science-**falcon > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@669: > Client environment:os.name=Linux > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@670: > Client environment:os.arch=2.6.32-**279.el6.x86_64 > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@671: > Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012 > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@679: > Client environment:user.name=root > 2012-09-13 22:20:31,890:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@687: > Client environment:user.home=/root > 2012-09-13 22:20:31,891:26922(**0x7fb78eb13720):ZOO_INFO@log_**env@699: > Client environment:user.dir=/usr/**local/mesos > 2012-09-13 22:20:31,891:26922(**0x7fb78eb13720):ZOO_INFO@** > zookeeper_init@727: Initiating client connection, host=localhost:2181 > sessionTimeout=10000 watcher=0x7fb78def8450 sessionId=0 > sessionPasswd=<null> context=0x11616c0 flags=0 > 2012-09-13 22:20:31,891:26922(**0x7fb78eb13720):ZOO_DEBUG@** > start_threads@152: starting threads... > 2012-09-13 22:20:31,892:26922(**0x7fb78a435700):ZOO_DEBUG@do_**io@279: > started IO thread > 2012-09-13 22:20:31,893:26922(**0x7fb789a34700):ZOO_DEBUG@do_** > completion@326: started completion thread > 2012-09-13 22:20:31,893:26922(**0x7fb78a435700):ZOO_INFO@** > check_events@1585: initiated connection to server [127.0.0.1:2181] > I0913 22:20:31.960598 26941 webui.cpp:61] Loading webui script at > '/usr/local/mesos/src/webui/**slave/webui.py' > 2012-09-13 22:20:31,972:26922(**0x7fb78a435700):ZOO_INFO@** > check_events@1632: session establishment complete on server [ > 127.0.0.1:2181], sessionId=0x139c14eb1dc0006, negotiated timeout=10000 > 2012-09-13 22:20:31,972:26922(**0x7fb78a435700):ZOO_DEBUG@** > check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the > state=ZOO_CONNECTED_STATE > 2012-09-13 22:20:31,972:26922(**0x7fb789a34700):ZOO_DEBUG@** > process_completions@1765: Calling a watcher for node [], type = -1 > event=ZOO_SESSION_EVENT > I0913 22:20:31.972904 26937 detector.cpp:287] Master detector connected to > ZooKeeper ... > I0913 22:20:31.973017 26937 detector.cpp:316] Trying to create znode > '/znode' in ZooKeeper > 2012-09-13 22:20:31,973:26922(**0x7fb78ba4c700):ZOO_DEBUG@zoo_** > acreate@2503: Sending request xid=0x50525c30 for path [/znode] to > 127.0.0.1:2181 > 2012-09-13 22:20:31,980:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1989: Queueing asynchronous response > 2012-09-13 22:20:31,980:26922(**0x7fb789a34700):ZOO_DEBUG@** > process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c30 > rc=-110 > Bottle server starting up (using WSGIRefServer())... > Listening on http://0.0.0.0:8081/ > Use Ctrl-C to quit. > > 2012-09-13 22:20:35,311:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:38,648:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:41,986:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:45,323:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:48,660:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:51,997:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:55,334:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:20:58,672:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 1 ms > 2012-09-13 22:21:02,008:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:21:05,346:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:21:08,683:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > 2012-09-13 22:21:12,020:26922(**0x7fb78a435700):ZOO_DEBUG@** > zookeeper_process@1983: Got ping response in 0 ms > > >
