[
https://issues.apache.org/jira/browse/MESOS-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578851#comment-13578851
]
Vinod Kone commented on MESOS-351:
----------------------------------
adding some more relevant data for posterity.
$ grep detector /var/log/mesos/mesos-slave.log
I0214 19:22:39.334646 15910 detector.cpp:375] Master detector lost connection
to ZooKeeper, attempting to reconnect ...
W0214 19:22:49.337018 15906 detector.cpp:436] Timed out waiting to reconnect to
ZooKeeper (sessionId=23969feb86ee1fd)
W0214 19:42:28.810739 15919 detector.cpp:394] Master detector ZooKeeper session
expired!
I0214 19:42:28.915663 15897 detector.cpp:284] Master detector connected to
ZooKeeper ...
I0214 19:42:28.915782 15897 detector.cpp:289] Authenticating to ZooKeeper using
scheme 'digest'
I0214 19:42:32.252157 15897 detector.cpp:301] Trying to create path
'/home/mesos/prod/master' in ZooKeeper
I0214 19:42:32.265502 15897 detector.cpp:486] Master detector found 3
registered masters
I0214 19:44:39.050536 15905 detector.cpp:375] Master detector lost connection
to ZooKeeper, attempting to reconnect ...
W0214 19:44:49.052129 15910 detector.cpp:436] Timed out waiting to reconnect to
ZooKeeper (sessionId=3969feb56c8ed1)
W0214 19:46:05.450275 15908 detector.cpp:394] Master detector ZooKeeper session
expired!
I0214 19:46:12.131224 15903 detector.cpp:284] Master detector connected to
ZooKeeper ...
I0214 19:46:12.131355 15903 detector.cpp:289] Authenticating to ZooKeeper using
scheme 'digest'
I0214 19:46:15.466456 15903 detector.cpp:301] Trying to create path
'/home/mesos/prod/master' in ZooKeeper
I0214 19:46:15.480890 15903 detector.cpp:486] Master detector found 3
registered masters
I0214 19:47:38.892096 15900 detector.cpp:375] Master detector lost connection
to ZooKeeper, attempting to reconnect ...
W0214 19:47:48.893839 15913 detector.cpp:436] Timed out waiting to reconnect to
ZooKeeper (sessionId=43969febe368cf6)
W0214 19:50:28.980358 15901 detector.cpp:394] Master detector ZooKeeper session
expired!
I0214 19:50:28.990350 15913 detector.cpp:284] Master detector connected to
ZooKeeper ...
I0214 19:50:28.990440 15913 detector.cpp:289] Authenticating to ZooKeeper using
scheme 'digest'
I0214 19:50:32.326746 15913 detector.cpp:301] Trying to create path
'/home/mesos/prod/master' in ZooKeeper
I0214 19:50:32.340391 15913 detector.cpp:486] Master detector found 3
registered masters
I0214 23:55:18.776129 15917 detector.cpp:375] Master detector lost connection
to ZooKeeper, attempting to reconnect ...
W0214 23:55:18.779386 15919 detector.cpp:394] Master detector ZooKeeper session
expired!
I0214 23:55:18.795264 15907 detector.cpp:284] Master detector connected to
ZooKeeper ...
I0214 23:55:18.795362 15907 detector.cpp:289] Authenticating to ZooKeeper using
scheme 'digest'
I0214 23:55:22.128913 15907 detector.cpp:301] Trying to create path
'/home/mesos/prod/master' in ZooKeeper
I0214 23:55:22.143316 15907 detector.cpp:486] Master detector found 3
registered masters
I0215 00:43:10.738438 17081 detector.cpp:284] Master detector connected to
ZooKeeper ...
I0215 00:43:10.738489 17081 detector.cpp:289] Authenticating to ZooKeeper using
scheme 'digest'
I0215 00:43:14.072826 17081 detector.cpp:301] Trying to create path
'/home/mesos/prod/master' in ZooKeeper
I0215 00:43:14.084419 17081 detector.cpp:486] Master detector found 3
registered masters
I0215 00:43:14.085523 17081 detector.cpp:532] Master detector got new master
pid: master@xxxxxxxx
> Master detector should not block on zookeeper 'get' method
> ----------------------------------------------------------
>
> Key: MESOS-351
> URL: https://issues.apache.org/jira/browse/MESOS-351
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
>
> Currently, the master detector code calls zk->get() in its detectMaster()
> method. Under the covers, we call zoo_aget(...., callback,....) and block
> till the callback is called.
> At Twitter, we encountered a situation where zk->get() got blocked because
> the callback was never called by the underlying zookeeper library. Even if a
> watch expired indicating session expired, the detector would not process it
> because it is blocked in detectMaster().
> I will add some data below.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira