Re: Two Zookeeper Questions

Pete Vander Giessen Mon, 13 Jun 2016 13:49:16 -0700

Hi Roman,

The code in question lives here (reactive handlers):
https://github.com/juju-solutions/bigtop/blob/zookeeper/bigtop-packages/src/charm/zookeeper/layer-zookeeper/reactive/zookeeper.py


And here (lib class that actually does the work):
https://github.com/juju-solutions/bigtop/blob/zookeeper/bigtop-packages/src/charm/zookeeper/layer-zookeeper/lib/charms/layer/zookeeper.py

> That's actually where the dragons may be.

Dragons are interesting. I will give the link a read through, and think
about what I need to do in order to validate that our charm plays nicely
with them. :-)

On Mon, Jun 13, 2016 at 4:35 PM Roman Shaposhnik <[email protected]>
wrote:

> On Mon, Jun 13, 2016 at 1:01 PM, Pete Vander Giessen
> <[email protected]> wrote:
> > Hi Roman,
> >
> > Welcome back! I hope that you had an excellent vacation :-)
>
> Thanks! It was a pretty awesome track trough  Fränkische Schweiz.
>
> >> I think Juju can help solve it with scaling up, but not down (and not
> > down-up).
> >> IOW, adding new nodes to the cluster should be no problem. If X nodes go
> >> down and Juju needs to bring fresh instances back up I think you'll run
> > into
> >> issues at the level of ZK implementation itself.
> >
> > We have a routine for removing nodes in the Juju charm. Juju won't re-use
> > ids by default, and I think that it will automatically hit the
> > "decrease_quorum" routine when a node goes down.
>
> Can you please point me (link) to a code base so we can be sure we're
> talking
> about the same thing? I'd like to see the implementation of this
> "decrease_quorum" routine.
>
> >> Definitely sounds reasonable going up (growing a cluster). But even in
> > that
> >> case, on the client side (e.g. anything like HBase or Giraph actually
> > using
> >> ZK as a coordination service) you'll be stuck with a stale list of ZK
> > ensemble.
> >> Not sure how that can be helped.
> >
> > The charm currently does the following on each zk node whenever a node is
> > added, or goes away:
> >
> > * Reads out the current list of nodes from the config (currently zoo.cfg,
> > though I have a TODO to make it just read stuff out of that ensemble
> value,
> > instead).
> > * Adds or removes the node in question from the list.
> > * Writes out the new list to ensemble.
> > * Re-runs puppet.
> >
> > Does that sound like correct behavior to you?
>
> That's actually where the dragons may be. I think anything prior to 3.5
> release
> (which IIRC is still considered alpha after being out for more than
> 1.5 years ;-))
> required a lot of care when doing that type of a manual rolling restart:
>
> http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeeper-to-dynamically-add-servers-to-the-ensemble/
> is a good summary of how precisely the orchestration of steps had to be
> done
> in order to change the composition of the cluster.
>
> I'm not saying that your Charms code doesn't do it exactly that way,
> all I'm saying
> is that ZK prior to 3.5 is SUPER brittle in this area.
>
> > If the other services keep
> > their own list of nodes via ensemble, then we'd need to add handlers on
> > each of those services that modify ensemble, when Zookeeper notifies them
> > that it has changed ...
>
> But that's my point. Prior to 3.5 release you will have to signal out
> of bound to
> restart the client AND only after you confirmed a full rolling restart
> of the cluster
> AND probably after the new nodes have caught up.
>
> Now, its actually not the end of the world for clients not to know the
> full set of ensemble.
> Its just that if the nodes they know about all go down -- there will
> be no recourse.
>
> Thanks,
> Roman.
>

Re: Two Zookeeper Questions

Reply via email to