Just a note that if you increase the number of instances of detectors you wind up increasing the number of instances of our `Zookeeper` client class. This class was written a long time ago and so is still a blocking interface rather than a Future-based non-blocking interface. As a result, the minimum number of threads needed for libprocess to avoid deadlocking will increase:
https://issues.apache.org/jira/browse/MESOS-8255 On Tue, Jan 16, 2018 at 1:32 PM, Avinash Sridharan <avin...@mesosphere.io> wrote: > That's correct. The number of watchers here would be doubled. Also, note > that the agent `MasterDetector` is actually checking the leader amongst the > `overlay-master` modules and not the Mesos master itself. Since these are > anonymous modules there is no way to share the slave/masters objects with > the modules (atleast the intention is not to have any sharing). That said, > we haven't seen any real impact on Zookeeper due to these extra > detectors/watchers being in place in practice. However, I would like to > post a disclaimer that this statement is more anecdotal rather than based > on some specific experiments. We haven't done any benchmarking ourselves to > verify the exact increase in load with these extra detectors in place. > > One thing to keep in mind is that Master failovers are relatively rare > events (or atleast they should be in a stable system). This implies that > while you might see a spike in the load for zookeeper watchers when the > event does occur, this is not something that should stress zookeeper out. > > On Tue, Jan 16, 2018 at 1:15 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > > Hi Avinash, > > > > Thanks for the pointer. A quick scan seems to suggest that your > > implementation requires it to create two different instances of > > `MasterDetector`, one for master/agent itself, and one for your module. > > That probably means number of watchers on zookeeper would be doubled? > Would > > that create unnecessary load? > > > > > > > > On Tue, Jan 16, 2018 at 11:22 AM, Avinash Sridharan < > avin...@mesosphere.io > > > > > wrote: > > > > > Hi Zhitao, > > > We actually do this in the `DC/OS` overlay modules we use in DC/OS. > The > > > overlay modules run as Mesos modules both in the Master and the Agent. > > You > > > can see how we use the `MasterDetector` in the agent module: > > > https://github.com/dcos/dcos-mesos-modules/blob/master/ > > > overlay/agent.cpp#L74 > > > > > > https://github.com/dcos/dcos-mesos-modules/blob/master/overlay/ should > > > have > > > more details of how the modules are actually used to build DC/OS > overlay. > > > > > > On Tue, Jan 16, 2018 at 10:20 AM, Zhitao Li <zhitaoli...@gmail.com> > > wrote: > > > > > > > Hi, > > > > > > > > Some of our future development work on our custom modules requires > them > > > to > > > > know the current leader of Mesos master. While it seems like we could > > > > duplicate the logic in master/slave side to duplicate an instance of > > > > `MasterDetector`, it seems more natural if we could figure out a > clean > > > way > > > > to share the existing detector instance in slave to the module. > > > > > > > > Any comment on this idea? > > > > > > > > Thanks. > > > > > > > > -- > > > > Cheers, > > > > > > > > Zhitao Li > > > > > > > > > > > > > > > > -- > > > Avinash Sridharan, Mesosphere > > > +1 (323) 702 5245 > > > > > > > > > > > -- > > Cheers, > > > > Zhitao Li > > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245 >