Can you file a ticket please and assign it to me? I will take a look. @vinodkone
On Thu, Sep 13, 2012 at 4:39 PM, Eugenia Gabrielova <[email protected]>wrote: > Hi Vinod, > > At this time I believe I am on the up-to-date Mesos trunk version; the > latest commit in the log is "Short term fix for ignoring executor resources > during launch". > > Thanks for the ACLs tip, I hadn't thought of that. I took a look at my > Zookeeper ACL permissions, and according to the documentation here ( > http://zookeeper.apache.org/**doc/r3.1.2/**zookeeperProgrammers.html#sc_** > ZooKeeperAccessControl<http://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl>), > they should be permissive enough. Is there a preferred ACL configuration > for Mesos? > > [zk: localhost:2181(CONNECTED) 0] getAcl /znode > 'world,'anyone > : cdrwa > [zk: localhost:2181(CONNECTED) 1] > > Thank you, > Eugenia > > > On 09/13/2012 03:54 PM, Vinod Kone wrote: > >> Hi Eugenia, >> >> What version of mesos are you running? I would suggest building and >> running >> the latest trunk version. >> >> Also, is your local zookeeper setup with proper ACLs? I would look at >> zookeeper logs to see if it shows anything suspicious? Because, to me, it >> looks like both the master and slave are having a hard time creating the >> znodes. >> >> @vinodkone >> >> >> On Thu, Sep 13, 2012 at 3:33 PM, Eugenia Gabrielova<[email protected]** >> >wrote: >> >> Dear Mesos Development Team, >>> >>> I'm running into an interesting failover bug when running Mesos on >>> Zookeeper. It seems to replicate this known failover bug, which is open: >>> https://issues.apache.org/****jira/browse/MESOS-246<https://issues.apache.org/**jira/browse/MESOS-246> >>> <https://**issues.apache.org/jira/browse/**MESOS-246<https://issues.apache.org/jira/browse/MESOS-246> >>> >. >>> >>> Would you be willing to suggest a possible workaround or a fix I can make >>> locally? I've tried a few configuration changes but am unable to find a >>> fix. >>> >>> Please find below my output from just a local setup, running a local >>> standalone Zookeeper + Mesos Master + Slave. I am running the latest >>> Master, and Zookeeper 3.3.6 (Stable latest); operating system is CentOS >>> 6.3. I do have Mesos running without issues on Zookeeper in one local >>> development environment (RHEL 6.2) with an identical configuration, but >>> in >>> each other environment (one CentOS 6.3, other RHEL 6.2), it encounters >>> similar ZOO_DEBUG -> ping 0ms behaviour on fresh installs. Please let me >>> know if I can provide further configuration information. >>> >>> Sincerely, >>> Eugenia >>> >>> Outputs: >>> *Master >>> *[...]> ./bin/mesos-master.sh --zk=zk://localhost:2181/znode >>> I0913 22:19:44.869660 26895 main.cpp:115] Build: 2012-09-13 19:33:05 by >>> root >>> I0913 22:19:44.870185 26895 main.cpp:116] Starting Mesos master >>> I0913 22:19:44.871429 26910 master.cpp:299] Master started on >>> 10.180.4.184:5050 >>> I0913 22:19:44.871531 26910 master.cpp:314] Master ID: >>> 201209132219-3087315978-5050-****26895 >>> >>> W0913 22:19:44.874285 26910 master.cpp:77] No whitelist given. >>> Advertising >>> offers for all slaves >>> 2012-09-13 22:19:44,874:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@658: >>> Client environment:zookeeper.version=****zookeeper C client 3.3.4 >>> 2012-09-13 22:19:44,874:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@662: >>> Client environment:host.name=science-****falcon >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@669: >>> Client environment:os.name=Linux >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@670: >>> Client environment:os.arch=2.6.32-****279.el6.x86_64 >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@671: >>> >>> Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012 >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@679: >>> Client environment:user.name=root >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@687: >>> Client environment:user.home=/root >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_*** >>> *env@699: >>> Client environment:user.dir=/usr/****local/mesos >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@** >>> >>> zookeeper_init@727: Initiating client connection, host=localhost:2181 >>> sessionTimeout=10000 watcher=0x7f04a35e0450 sessionId=0 >>> sessionPasswd=<null> context=0x849ea0 flags=0 >>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_DEBUG@** >>> start_threads@152: starting threads... >>> 2012-09-13 22:19:44,876:26895(****0x7f049fb1d700):ZOO_DEBUG@do_*** >>> *io@279: >>> started IO thread >>> 2012-09-13 22:19:44,878:26895(****0x7f049f11c700):ZOO_DEBUG@do_**** >>> completion@326: started completion thread >>> 2012-09-13 22:19:44,878:26895(****0x7f049fb1d700):ZOO_INFO@** >>> >>> check_events@1585: initiated connection to server [::1:2181] >>> I0913 22:19:44.899587 26914 webui.cpp:61] Loading webui script at >>> '/usr/local/mesos/src/webui/****master/webui.py' >>> 2012-09-13 22:19:44,974:26895(****0x7f049fb1d700):ZOO_INFO@** >>> >>> check_events@1632: session establishment complete on server [::1:2181], >>> sessionId=0x139c14eb1dc0005, negotiated timeout=10000 >>> 2012-09-13 22:19:44,974:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the >>> state=ZOO_CONNECTED_STATE >>> 2012-09-13 22:19:44,974:26895(****0x7f049f11c700):ZOO_DEBUG@** >>> >>> process_completions@1765: Calling a watcher for node [], type = -1 >>> event=ZOO_SESSION_EVENT >>> I0913 22:19:44.975180 26910 detector.cpp:287] Master detector connected >>> to >>> ZooKeeper ... >>> I0913 22:19:44.975307 26910 detector.cpp:316] Trying to create znode >>> '/znode' in ZooKeeper >>> 2012-09-13 22:19:44,975:26895(****0x7f04a1134700):ZOO_DEBUG@zoo_**** >>> >>> acreate@2503: Sending request xid=0x50525c01 for path [/znode] to >>> ::1:2181 >>> 2012-09-13 22:19:44,982:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> zookeeper_process@1989: Queueing asynchronous response >>> 2012-09-13 22:19:44,982:26895(****0x7f049f11c700):ZOO_DEBUG@** >>> >>> process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c01 >>> rc=-110 >>> Bottle server starting up (using WSGIRefServer())... >>> Listening on http://0.0.0.0:8080/ >>> Use Ctrl-C to quit. >>> >>> 2012-09-13 22:19:48,313:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:19:51,662:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 11 ms >>> 2012-09-13 22:19:54,988:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:19:58,325:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:01,662:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:04,999:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:08,336:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:11,674:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:15,013:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 3 ms >>> 2012-09-13 22:20:18,348:26895(****0x7f049fb1d700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> >>> *Slave >>> *[...]> ./bin/mesos-slave.sh --master=zk://localhost:2181/****znode >>> >>> "--resources=cpus:2;mem:1024" >>> I0913 22:20:31.852876 26922 main.cpp:123] Creating "process" isolation >>> module >>> I0913 22:20:31.853536 26922 main.cpp:131] Build: 2012-09-13 19:33:05 by >>> root >>> I0913 22:20:31.853575 26922 main.cpp:132] Starting Mesos slave >>> I0913 22:20:31.887912 26937 slave.cpp:172] Slave started on 1)@ >>> 10.180.4.184:60711 >>> I0913 22:20:31.887969 26937 slave.cpp:173] Slave resources: cpus=2; >>> mem=1024 >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@658: >>> Client environment:zookeeper.version=****zookeeper C client 3.3.4 >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@662: >>> Client environment:host.name=science-****falcon >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@669: >>> Client environment:os.name=Linux >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@670: >>> Client environment:os.arch=2.6.32-****279.el6.x86_64 >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@671: >>> >>> Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012 >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@679: >>> Client environment:user.name=root >>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@687: >>> Client environment:user.home=/root >>> 2012-09-13 22:20:31,891:26922(****0x7fb78eb13720):ZOO_INFO@log_*** >>> *env@699: >>> Client environment:user.dir=/usr/****local/mesos >>> 2012-09-13 22:20:31,891:26922(****0x7fb78eb13720):ZOO_INFO@** >>> >>> zookeeper_init@727: Initiating client connection, host=localhost:2181 >>> sessionTimeout=10000 watcher=0x7fb78def8450 sessionId=0 >>> sessionPasswd=<null> context=0x11616c0 flags=0 >>> 2012-09-13 22:20:31,891:26922(****0x7fb78eb13720):ZOO_DEBUG@** >>> start_threads@152: starting threads... >>> 2012-09-13 22:20:31,892:26922(****0x7fb78a435700):ZOO_DEBUG@do_*** >>> *io@279: >>> started IO thread >>> 2012-09-13 22:20:31,893:26922(****0x7fb789a34700):ZOO_DEBUG@do_**** >>> completion@326: started completion thread >>> 2012-09-13 22:20:31,893:26922(****0x7fb78a435700):ZOO_INFO@** >>> >>> check_events@1585: initiated connection to server [127.0.0.1:2181] >>> I0913 22:20:31.960598 26941 webui.cpp:61] Loading webui script at >>> '/usr/local/mesos/src/webui/****slave/webui.py' >>> 2012-09-13 22:20:31,972:26922(****0x7fb78a435700):ZOO_INFO@** >>> >>> check_events@1632: session establishment complete on server [ >>> 127.0.0.1:2181], sessionId=0x139c14eb1dc0006, negotiated timeout=10000 >>> 2012-09-13 22:20:31,972:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the >>> state=ZOO_CONNECTED_STATE >>> 2012-09-13 22:20:31,972:26922(****0x7fb789a34700):ZOO_DEBUG@** >>> >>> process_completions@1765: Calling a watcher for node [], type = -1 >>> event=ZOO_SESSION_EVENT >>> I0913 22:20:31.972904 26937 detector.cpp:287] Master detector connected >>> to >>> ZooKeeper ... >>> I0913 22:20:31.973017 26937 detector.cpp:316] Trying to create znode >>> '/znode' in ZooKeeper >>> 2012-09-13 22:20:31,973:26922(****0x7fb78ba4c700):ZOO_DEBUG@zoo_**** >>> >>> acreate@2503: Sending request xid=0x50525c30 for path [/znode] to >>> 127.0.0.1:2181 >>> 2012-09-13 22:20:31,980:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> zookeeper_process@1989: Queueing asynchronous response >>> 2012-09-13 22:20:31,980:26922(****0x7fb789a34700):ZOO_DEBUG@** >>> >>> process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c30 >>> rc=-110 >>> Bottle server starting up (using WSGIRefServer())... >>> Listening on http://0.0.0.0:8081/ >>> Use Ctrl-C to quit. >>> >>> 2012-09-13 22:20:35,311:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:38,648:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:41,986:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:45,323:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:48,660:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:51,997:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:55,334:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:20:58,672:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 1 ms >>> 2012-09-13 22:21:02,008:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:21:05,346:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:21:08,683:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> 2012-09-13 22:21:12,020:26922(****0x7fb78a435700):ZOO_DEBUG@** >>> >>> zookeeper_process@1983: Got ping response in 0 ms >>> >>> >>> >>> >
