Dear Mesos Development Team,
I'm running into an interesting failover bug when running Mesos on
Zookeeper. It seems to replicate this known failover bug, which is open:
https://issues.apache.org/jira/browse/MESOS-246. Would you be willing to
suggest a possible workaround or a fix I can make locally? I've tried a
few configuration changes but am unable to find a fix.
Please find below my output from just a local setup, running a local
standalone Zookeeper + Mesos Master + Slave. I am running the latest
Master, and Zookeeper 3.3.6 (Stable latest); operating system is CentOS
6.3. I do have Mesos running without issues on Zookeeper in one local
development environment (RHEL 6.2) with an identical configuration, but
in each other environment (one CentOS 6.3, other RHEL 6.2), it
encounters similar ZOO_DEBUG -> ping 0ms behaviour on fresh installs.
Please let me know if I can provide further configuration information.
Sincerely,
Eugenia
Outputs:
*Master
*[...] > ./bin/mesos-master.sh --zk=zk://localhost:2181/znode
I0913 22:19:44.869660 26895 main.cpp:115] Build: 2012-09-13 19:33:05 by root
I0913 22:19:44.870185 26895 main.cpp:116] Starting Mesos master
I0913 22:19:44.871429 26910 master.cpp:299] Master started on
10.180.4.184:5050
I0913 22:19:44.871531 26910 master.cpp:314] Master ID:
201209132219-3087315978-5050-26895
W0913 22:19:44.874285 26910 master.cpp:77] No whitelist given.
Advertising offers for all slaves
2012-09-13 22:19:44,874:26895(0x7f04a41fb720):ZOO_INFO@log_env@658:
Client environment:zookeeper.version=zookeeper C client 3.3.4
2012-09-13 22:19:44,874:26895(0x7f04a41fb720):ZOO_INFO@log_env@662:
Client environment:host.name=science-falcon
2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@669:
Client environment:os.name=Linux
2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@670:
Client environment:os.arch=2.6.32-279.el6.x86_64
2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@671:
Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012
2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@679:
Client environment:user.name=root
2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@687:
Client environment:user.home=/root
2012-09-13 22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@log_env@699:
Client environment:user.dir=/usr/local/mesos
2012-09-13
22:19:44,875:26895(0x7f04a41fb720):ZOO_INFO@zookeeper_init@727:
Initiating client connection, host=localhost:2181 sessionTimeout=10000
watcher=0x7f04a35e0450 sessionId=0 sessionPasswd=<null> context=0x849ea0
flags=0
2012-09-13
22:19:44,875:26895(0x7f04a41fb720):ZOO_DEBUG@start_threads@152: starting
threads...
2012-09-13 22:19:44,876:26895(0x7f049fb1d700):ZOO_DEBUG@do_io@279:
started IO thread
2012-09-13
22:19:44,878:26895(0x7f049f11c700):ZOO_DEBUG@do_completion@326: started
completion thread
2012-09-13
22:19:44,878:26895(0x7f049fb1d700):ZOO_INFO@check_events@1585: initiated
connection to server [::1:2181]
I0913 22:19:44.899587 26914 webui.cpp:61] Loading webui script at
'/usr/local/mesos/src/webui/master/webui.py'
2012-09-13
22:19:44,974:26895(0x7f049fb1d700):ZOO_INFO@check_events@1632: session
establishment complete on server [::1:2181],
sessionId=0x139c14eb1dc0005, negotiated timeout=10000
2012-09-13
22:19:44,974:26895(0x7f049fb1d700):ZOO_DEBUG@check_events@1638: Calling
a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2012-09-13
22:19:44,974:26895(0x7f049f11c700):ZOO_DEBUG@process_completions@1765:
Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
I0913 22:19:44.975180 26910 detector.cpp:287] Master detector connected
to ZooKeeper ...
I0913 22:19:44.975307 26910 detector.cpp:316] Trying to create znode
'/znode' in ZooKeeper
2012-09-13
22:19:44,975:26895(0x7f04a1134700):ZOO_DEBUG@zoo_acreate@2503: Sending
request xid=0x50525c01 for path [/znode] to ::1:2181
2012-09-13
22:19:44,982:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1989:
Queueing asynchronous response
2012-09-13
22:19:44,982:26895(0x7f049f11c700):ZOO_DEBUG@process_completions@1817:
Calling COMPLETION_STRING for xid=0x50525c01 rc=-110
Bottle server starting up (using WSGIRefServer())...
Listening on http://0.0.0.0:8080/
Use Ctrl-C to quit.
2012-09-13
22:19:48,313:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:19:51,662:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 11 ms
2012-09-13
22:19:54,988:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:19:58,325:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:01,662:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:04,999:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:08,336:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:11,674:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:15,013:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 3 ms
2012-09-13
22:20:18,348:26895(0x7f049fb1d700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
*Slave
*[...] > ./bin/mesos-slave.sh --master=zk://localhost:2181/znode
"--resources=cpus:2;mem:1024"
I0913 22:20:31.852876 26922 main.cpp:123] Creating "process" isolation
module
I0913 22:20:31.853536 26922 main.cpp:131] Build: 2012-09-13 19:33:05 by root
I0913 22:20:31.853575 26922 main.cpp:132] Starting Mesos slave
I0913 22:20:31.887912 26937 slave.cpp:172] Slave started on
1)@10.180.4.184:60711
I0913 22:20:31.887969 26937 slave.cpp:173] Slave resources: cpus=2; mem=1024
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@658:
Client environment:zookeeper.version=zookeeper C client 3.3.4
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@662:
Client environment:host.name=science-falcon
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@669:
Client environment:os.name=Linux
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@670:
Client environment:os.arch=2.6.32-279.el6.x86_64
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@671:
Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@679:
Client environment:user.name=root
2012-09-13 22:20:31,890:26922(0x7fb78eb13720):ZOO_INFO@log_env@687:
Client environment:user.home=/root
2012-09-13 22:20:31,891:26922(0x7fb78eb13720):ZOO_INFO@log_env@699:
Client environment:user.dir=/usr/local/mesos
2012-09-13
22:20:31,891:26922(0x7fb78eb13720):ZOO_INFO@zookeeper_init@727:
Initiating client connection, host=localhost:2181 sessionTimeout=10000
watcher=0x7fb78def8450 sessionId=0 sessionPasswd=<null>
context=0x11616c0 flags=0
2012-09-13
22:20:31,891:26922(0x7fb78eb13720):ZOO_DEBUG@start_threads@152: starting
threads...
2012-09-13 22:20:31,892:26922(0x7fb78a435700):ZOO_DEBUG@do_io@279:
started IO thread
2012-09-13
22:20:31,893:26922(0x7fb789a34700):ZOO_DEBUG@do_completion@326: started
completion thread
2012-09-13
22:20:31,893:26922(0x7fb78a435700):ZOO_INFO@check_events@1585: initiated
connection to server [127.0.0.1:2181]
I0913 22:20:31.960598 26941 webui.cpp:61] Loading webui script at
'/usr/local/mesos/src/webui/slave/webui.py'
2012-09-13
22:20:31,972:26922(0x7fb78a435700):ZOO_INFO@check_events@1632: session
establishment complete on server [127.0.0.1:2181],
sessionId=0x139c14eb1dc0006, negotiated timeout=10000
2012-09-13
22:20:31,972:26922(0x7fb78a435700):ZOO_DEBUG@check_events@1638: Calling
a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2012-09-13
22:20:31,972:26922(0x7fb789a34700):ZOO_DEBUG@process_completions@1765:
Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
I0913 22:20:31.972904 26937 detector.cpp:287] Master detector connected
to ZooKeeper ...
I0913 22:20:31.973017 26937 detector.cpp:316] Trying to create znode
'/znode' in ZooKeeper
2012-09-13
22:20:31,973:26922(0x7fb78ba4c700):ZOO_DEBUG@zoo_acreate@2503: Sending
request xid=0x50525c30 for path [/znode] to 127.0.0.1:2181
2012-09-13
22:20:31,980:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1989:
Queueing asynchronous response
2012-09-13
22:20:31,980:26922(0x7fb789a34700):ZOO_DEBUG@process_completions@1817:
Calling COMPLETION_STRING for xid=0x50525c30 rc=-110
Bottle server starting up (using WSGIRefServer())...
Listening on http://0.0.0.0:8081/
Use Ctrl-C to quit.
2012-09-13
22:20:35,311:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:38,648:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:41,986:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:45,323:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:48,660:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:51,997:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:55,334:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:20:58,672:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 1 ms
2012-09-13
22:21:02,008:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:21:05,346:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:21:08,683:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms
2012-09-13
22:21:12,020:26922(0x7fb78a435700):ZOO_DEBUG@zookeeper_process@1983: Got
ping response in 0 ms