Dear Mesos Development Team,

I'm running into an interesting failover bug when running Mesos on Zookeeper. It seems to replicate this known failover bug, which is open: https://issues.apache.org/jira/browse/MESOS-246. Would you be willing to suggest a possible workaround or a fix I can make locally? I've tried a few configuration changes but am unable to find a fix.

Please find below my output from just a local setup, running a local standalone Zookeeper + Mesos Master + Slave. I am running the latest Master, and Zookeeper 3.3.6 (Stable latest); operating system is CentOS 6.3. I do have Mesos running without issues on Zookeeper in one local development environment (RHEL 6.2) with an identical configuration, but in each other environment (one CentOS 6.3, other RHEL 6.2), it encounters similar ZOO_DEBUG -> ping 0ms behaviour on fresh installs. Please let me know if I can provide further configuration information.

Sincerely,
Eugenia

Outputs:
*Master
*[...] > ./bin/mesos-master.sh --zk=zk://localhost:2181/znode
I0913 22:19:44.869660 26895 main.cpp:115] Build: 2012-09-13 19:33:05 by root
I0913 22:19:44.870185 26895 main.cpp:116] Starting Mesos master
I0913 22:19:44.871429 26910 master.cpp:299] Master started on 10.180.4.184:5050 I0913 22:19:44.871531 26910 master.cpp:314] Master ID: 201209132219-3087315978-5050-26895 W0913 22:19:44.874285 26910 master.cpp:77] No whitelist given. Advertising offers for all slaves 2012-09-13 22:19:44,874:26895(0x7f04a41fb720):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.4 2012-09-13 22:19:44,874:26895(0x7f04a41fb720):ZOO_INFO@log_env@662: Client environment:host.name=science-falcon 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@669: Client environment:os.name=Linux 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@670: Client environment:os.arch=2.6.32-279.el6.x86_64 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@671: Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@679: Client environment:user.name=root 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@687: Client environment:user.home=/root 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@699: Client environment:user.dir=/usr/local/mesos 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f04a35e0450 sessionId=0 sessionPasswd=<null> context=0x849ea0 flags=0 2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_DEBUG@start_threads@152: starting threads... 2012-09-13 22:19:44,876:26895(0x7f049fb1d700):ZOO_DEBUG@do_io@279: started IO thread 2012-09-13 22:19:44,878:26895(0x7f049f11c700):ZOO_DEBUG@do_completion@326: started completion thread 2012-09-13 22:19:44,878:26895(0x7f049fb1d700):ZOO_INFO@check_events@1585: initiated connection to server [::1:2181] I0913 22:19:44.899587 26914 webui.cpp:61] Loading webui script at '/usr/local/mesos/src/webui/master/webui.py' 2012-09-13 22:19:44,974:26895(0x7f049fb1d700):ZOO_INFO@check_events@1632: session establishment complete on server [::1:2181], sessionId=0x139c14eb1dc0005, negotiated timeout=10000 2012-09-13 22:19:44,974:26895(0x7f049fb1d700):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE 2012-09-13 22:19:44,974:26895(0x7f049f11c700):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT I0913 22:19:44.975180 26910 detector.cpp:287] Master detector connected to ZooKeeper ... I0913 22:19:44.975307 26910 detector.cpp:316] Trying to create znode '/znode' in ZooKeeper 2012-09-13 22:19:44,975:26895(0x7f04a1134700):ZOO_DEBUG@zoo_acreate@2503: Sending request xid=0x50525c01 for path [/znode] to ::1:2181 2012-09-13 22:19:44,982:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1989: Queueing asynchronous response 2012-09-13 22:19:44,982:26895(0x7f049f11c700):ZOO_DEBUG@process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c01 rc=-110
Bottle server starting up (using WSGIRefServer())...
Listening on http://0.0.0.0:8080/
Use Ctrl-C to quit.

2012-09-13 22:19:48,313:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:19:51,662:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 11 ms 2012-09-13 22:19:54,988:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:19:58,325:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:01,662:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:04,999:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:08,336:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:11,674:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:15,013:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 3 ms 2012-09-13 22:20:18,348:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms

*Slave
*[...] > ./bin/mesos-slave.sh --master=zk://localhost:2181/znode "--resources=cpus:2;mem:1024" I0913 22:20:31.852876 26922 main.cpp:123] Creating "process" isolation module
I0913 22:20:31.853536 26922 main.cpp:131] Build: 2012-09-13 19:33:05 by root
I0913 22:20:31.853575 26922 main.cpp:132] Starting Mesos slave
I0913 22:20:31.887912 26937 slave.cpp:172] Slave started on 1)@10.180.4.184:60711
I0913 22:20:31.887969 26937 slave.cpp:173] Slave resources: cpus=2; mem=1024
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.4 2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@662: Client environment:host.name=science-falcon 2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@669: Client environment:os.name=Linux 2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@670: Client environment:os.arch=2.6.32-279.el6.x86_64 2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@671: Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012 2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@679: Client environment:user.name=root 2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@687: Client environment:user.home=/root 2012-09-13 22:20:31,891:26922(0x7fb78eb13720):ZOO_INFO@log_env@699: Client environment:user.dir=/usr/local/mesos 2012-09-13 22:20:31,891:26922(0x7fb78eb13720):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7fb78def8450 sessionId=0 sessionPasswd=<null> context=0x11616c0 flags=0 2012-09-13 22:20:31,891:26922(0x7fb78eb13720):ZOO_DEBUG@start_threads@152: starting threads... 2012-09-13 22:20:31,892:26922(0x7fb78a435700):ZOO_DEBUG@do_io@279: started IO thread 2012-09-13 22:20:31,893:26922(0x7fb789a34700):ZOO_DEBUG@do_completion@326: started completion thread 2012-09-13 22:20:31,893:26922(0x7fb78a435700):ZOO_INFO@check_events@1585: initiated connection to server [127.0.0.1:2181] I0913 22:20:31.960598 26941 webui.cpp:61] Loading webui script at '/usr/local/mesos/src/webui/slave/webui.py' 2012-09-13 22:20:31,972:26922(0x7fb78a435700):ZOO_INFO@check_events@1632: session establishment complete on server [127.0.0.1:2181], sessionId=0x139c14eb1dc0006, negotiated timeout=10000 2012-09-13 22:20:31,972:26922(0x7fb78a435700):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE 2012-09-13 22:20:31,972:26922(0x7fb789a34700):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT I0913 22:20:31.972904 26937 detector.cpp:287] Master detector connected to ZooKeeper ... I0913 22:20:31.973017 26937 detector.cpp:316] Trying to create znode '/znode' in ZooKeeper 2012-09-13 22:20:31,973:26922(0x7fb78ba4c700):ZOO_DEBUG@zoo_acreate@2503: Sending request xid=0x50525c30 for path [/znode] to 127.0.0.1:2181 2012-09-13 22:20:31,980:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1989: Queueing asynchronous response 2012-09-13 22:20:31,980:26922(0x7fb789a34700):ZOO_DEBUG@process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c30 rc=-110
Bottle server starting up (using WSGIRefServer())...
Listening on http://0.0.0.0:8081/
Use Ctrl-C to quit.

2012-09-13 22:20:35,311:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:38,648:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:41,986:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:45,323:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:48,660:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:51,997:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:55,334:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:20:58,672:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 1 ms 2012-09-13 22:21:02,008:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:21:05,346:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:21:08,683:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms 2012-09-13 22:21:12,020:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms


Reply via email to