Can you file a ticket please and assign it to me? I will take a look.

@vinodkone


On Thu, Sep 13, 2012 at 4:39 PM, Eugenia Gabrielova <[email protected]>wrote:

> Hi Vinod,
>
> At this time I believe I am on the up-to-date Mesos trunk version; the
> latest commit in the log is "Short term fix for ignoring executor resources
> during launch".
>
> Thanks for the ACLs tip, I hadn't thought of that. I took a look at my
> Zookeeper ACL permissions, and according to the documentation here (
> http://zookeeper.apache.org/**doc/r3.1.2/**zookeeperProgrammers.html#sc_**
> ZooKeeperAccessControl<http://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl>),
> they should be permissive enough. Is there a preferred ACL configuration
> for Mesos?
>
> [zk: localhost:2181(CONNECTED) 0] getAcl /znode
> 'world,'anyone
> : cdrwa
> [zk: localhost:2181(CONNECTED) 1]
>
> Thank you,
> Eugenia
>
>
> On 09/13/2012 03:54 PM, Vinod Kone wrote:
>
>> Hi Eugenia,
>>
>> What version of mesos are you running? I would suggest building and
>> running
>> the latest trunk version.
>>
>> Also, is your local zookeeper setup with proper ACLs? I would look at
>> zookeeper logs to see if it shows anything suspicious? Because, to me, it
>> looks like both the master and slave are having a hard time creating the
>> znodes.
>>
>> @vinodkone
>>
>>
>> On Thu, Sep 13, 2012 at 3:33 PM, Eugenia Gabrielova<[email protected]**
>> >wrote:
>>
>>  Dear Mesos Development Team,
>>>
>>> I'm running into an interesting failover bug when running Mesos on
>>> Zookeeper. It seems to replicate this known failover bug, which is open:
>>> https://issues.apache.org/****jira/browse/MESOS-246<https://issues.apache.org/**jira/browse/MESOS-246>
>>> <https://**issues.apache.org/jira/browse/**MESOS-246<https://issues.apache.org/jira/browse/MESOS-246>
>>> >.
>>>
>>> Would you be willing to suggest a possible workaround or a fix I can make
>>> locally? I've tried a few configuration changes but am unable to find a
>>> fix.
>>>
>>> Please find below my output from just a local setup, running a local
>>> standalone Zookeeper + Mesos Master + Slave. I am running the latest
>>> Master, and Zookeeper 3.3.6 (Stable latest); operating system is CentOS
>>> 6.3. I do have Mesos running without issues on Zookeeper in one local
>>> development environment (RHEL 6.2) with an identical configuration, but
>>> in
>>> each other environment (one CentOS 6.3, other RHEL 6.2), it encounters
>>> similar ZOO_DEBUG ->  ping 0ms behaviour on fresh installs. Please let me
>>> know if I can provide further configuration information.
>>>
>>> Sincerely,
>>> Eugenia
>>>
>>> Outputs:
>>> *Master
>>> *[...]>  ./bin/mesos-master.sh --zk=zk://localhost:2181/znode
>>> I0913 22:19:44.869660 26895 main.cpp:115] Build: 2012-09-13 19:33:05 by
>>> root
>>> I0913 22:19:44.870185 26895 main.cpp:116] Starting Mesos master
>>> I0913 22:19:44.871429 26910 master.cpp:299] Master started on
>>> 10.180.4.184:5050
>>> I0913 22:19:44.871531 26910 master.cpp:314] Master ID:
>>> 201209132219-3087315978-5050-****26895
>>>
>>> W0913 22:19:44.874285 26910 master.cpp:77] No whitelist given.
>>> Advertising
>>> offers for all slaves
>>> 2012-09-13 22:19:44,874:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@658:
>>> Client environment:zookeeper.version=****zookeeper C client 3.3.4
>>> 2012-09-13 22:19:44,874:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@662:
>>> Client environment:host.name=science-****falcon
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@669:
>>> Client environment:os.name=Linux
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@670:
>>> Client environment:os.arch=2.6.32-****279.el6.x86_64
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@671:
>>>
>>> Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@679:
>>> Client environment:user.name=root
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@687:
>>> Client environment:user.home=/root
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@log_***
>>> *env@699:
>>> Client environment:user.dir=/usr/****local/mesos
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_INFO@**
>>>
>>> zookeeper_init@727: Initiating client connection, host=localhost:2181
>>> sessionTimeout=10000 watcher=0x7f04a35e0450 sessionId=0
>>> sessionPasswd=<null>  context=0x849ea0 flags=0
>>> 2012-09-13 22:19:44,875:26895(****0x7f04a41fb720):ZOO_DEBUG@**
>>> start_threads@152: starting threads...
>>> 2012-09-13 22:19:44,876:26895(****0x7f049fb1d700):ZOO_DEBUG@do_***
>>> *io@279:
>>> started IO thread
>>> 2012-09-13 22:19:44,878:26895(****0x7f049f11c700):ZOO_DEBUG@do_****
>>> completion@326: started completion thread
>>> 2012-09-13 22:19:44,878:26895(****0x7f049fb1d700):ZOO_INFO@**
>>>
>>> check_events@1585: initiated connection to server [::1:2181]
>>> I0913 22:19:44.899587 26914 webui.cpp:61] Loading webui script at
>>> '/usr/local/mesos/src/webui/****master/webui.py'
>>> 2012-09-13 22:19:44,974:26895(****0x7f049fb1d700):ZOO_INFO@**
>>>
>>> check_events@1632: session establishment complete on server [::1:2181],
>>> sessionId=0x139c14eb1dc0005, negotiated timeout=10000
>>> 2012-09-13 22:19:44,974:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the
>>> state=ZOO_CONNECTED_STATE
>>> 2012-09-13 22:19:44,974:26895(****0x7f049f11c700):ZOO_DEBUG@**
>>>
>>> process_completions@1765: Calling a watcher for node [], type = -1
>>> event=ZOO_SESSION_EVENT
>>> I0913 22:19:44.975180 26910 detector.cpp:287] Master detector connected
>>> to
>>> ZooKeeper ...
>>> I0913 22:19:44.975307 26910 detector.cpp:316] Trying to create znode
>>> '/znode' in ZooKeeper
>>> 2012-09-13 22:19:44,975:26895(****0x7f04a1134700):ZOO_DEBUG@zoo_****
>>>
>>> acreate@2503: Sending request xid=0x50525c01 for path [/znode] to
>>> ::1:2181
>>> 2012-09-13 22:19:44,982:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>> zookeeper_process@1989: Queueing asynchronous response
>>> 2012-09-13 22:19:44,982:26895(****0x7f049f11c700):ZOO_DEBUG@**
>>>
>>> process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c01
>>> rc=-110
>>> Bottle server starting up (using WSGIRefServer())...
>>> Listening on http://0.0.0.0:8080/
>>> Use Ctrl-C to quit.
>>>
>>> 2012-09-13 22:19:48,313:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:19:51,662:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 11 ms
>>> 2012-09-13 22:19:54,988:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:19:58,325:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:01,662:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:04,999:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:08,336:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:11,674:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:15,013:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 3 ms
>>> 2012-09-13 22:20:18,348:26895(****0x7f049fb1d700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>>
>>> *Slave
>>> *[...]>  ./bin/mesos-slave.sh --master=zk://localhost:2181/****znode
>>>
>>> "--resources=cpus:2;mem:1024"
>>> I0913 22:20:31.852876 26922 main.cpp:123] Creating "process" isolation
>>> module
>>> I0913 22:20:31.853536 26922 main.cpp:131] Build: 2012-09-13 19:33:05 by
>>> root
>>> I0913 22:20:31.853575 26922 main.cpp:132] Starting Mesos slave
>>> I0913 22:20:31.887912 26937 slave.cpp:172] Slave started on 1)@
>>> 10.180.4.184:60711
>>> I0913 22:20:31.887969 26937 slave.cpp:173] Slave resources: cpus=2;
>>> mem=1024
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@658:
>>> Client environment:zookeeper.version=****zookeeper C client 3.3.4
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@662:
>>> Client environment:host.name=science-****falcon
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@669:
>>> Client environment:os.name=Linux
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@670:
>>> Client environment:os.arch=2.6.32-****279.el6.x86_64
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@671:
>>>
>>> Client environment:os.version=#1 SMP Fri Jun 22 12:19:21 UTC 2012
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@679:
>>> Client environment:user.name=root
>>> 2012-09-13 22:20:31,890:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@687:
>>> Client environment:user.home=/root
>>> 2012-09-13 22:20:31,891:26922(****0x7fb78eb13720):ZOO_INFO@log_***
>>> *env@699:
>>> Client environment:user.dir=/usr/****local/mesos
>>> 2012-09-13 22:20:31,891:26922(****0x7fb78eb13720):ZOO_INFO@**
>>>
>>> zookeeper_init@727: Initiating client connection, host=localhost:2181
>>> sessionTimeout=10000 watcher=0x7fb78def8450 sessionId=0
>>> sessionPasswd=<null>  context=0x11616c0 flags=0
>>> 2012-09-13 22:20:31,891:26922(****0x7fb78eb13720):ZOO_DEBUG@**
>>> start_threads@152: starting threads...
>>> 2012-09-13 22:20:31,892:26922(****0x7fb78a435700):ZOO_DEBUG@do_***
>>> *io@279:
>>> started IO thread
>>> 2012-09-13 22:20:31,893:26922(****0x7fb789a34700):ZOO_DEBUG@do_****
>>> completion@326: started completion thread
>>> 2012-09-13 22:20:31,893:26922(****0x7fb78a435700):ZOO_INFO@**
>>>
>>> check_events@1585: initiated connection to server [127.0.0.1:2181]
>>> I0913 22:20:31.960598 26941 webui.cpp:61] Loading webui script at
>>> '/usr/local/mesos/src/webui/****slave/webui.py'
>>> 2012-09-13 22:20:31,972:26922(****0x7fb78a435700):ZOO_INFO@**
>>>
>>> check_events@1632: session establishment complete on server [
>>> 127.0.0.1:2181], sessionId=0x139c14eb1dc0006, negotiated timeout=10000
>>> 2012-09-13 22:20:31,972:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the
>>> state=ZOO_CONNECTED_STATE
>>> 2012-09-13 22:20:31,972:26922(****0x7fb789a34700):ZOO_DEBUG@**
>>>
>>> process_completions@1765: Calling a watcher for node [], type = -1
>>> event=ZOO_SESSION_EVENT
>>> I0913 22:20:31.972904 26937 detector.cpp:287] Master detector connected
>>> to
>>> ZooKeeper ...
>>> I0913 22:20:31.973017 26937 detector.cpp:316] Trying to create znode
>>> '/znode' in ZooKeeper
>>> 2012-09-13 22:20:31,973:26922(****0x7fb78ba4c700):ZOO_DEBUG@zoo_****
>>>
>>> acreate@2503: Sending request xid=0x50525c30 for path [/znode] to
>>> 127.0.0.1:2181
>>> 2012-09-13 22:20:31,980:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>> zookeeper_process@1989: Queueing asynchronous response
>>> 2012-09-13 22:20:31,980:26922(****0x7fb789a34700):ZOO_DEBUG@**
>>>
>>> process_completions@1817: Calling COMPLETION_STRING for xid=0x50525c30
>>> rc=-110
>>> Bottle server starting up (using WSGIRefServer())...
>>> Listening on http://0.0.0.0:8081/
>>> Use Ctrl-C to quit.
>>>
>>> 2012-09-13 22:20:35,311:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:38,648:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:41,986:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:45,323:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:48,660:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:51,997:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:55,334:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:20:58,672:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 1 ms
>>> 2012-09-13 22:21:02,008:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:21:05,346:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:21:08,683:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>> 2012-09-13 22:21:12,020:26922(****0x7fb78a435700):ZOO_DEBUG@**
>>>
>>> zookeeper_process@1983: Got ping response in 0 ms
>>>
>>>
>>>
>>>
>

Reply via email to