Re: Two Zookeeper Questions

Roman Shaposhnik Mon, 13 Jun 2016 13:36:02 -0700

On Mon, Jun 13, 2016 at 1:01 PM, Pete Vander Giessen
<[email protected]> wrote:
> Hi Roman,
>
> Welcome back! I hope that you had an excellent vacation :-)


Thanks! It was a pretty awesome track trough  Fränkische Schweiz.

>> I think Juju can help solve it with scaling up, but not down (and not
> down-up).
>> IOW, adding new nodes to the cluster should be no problem. If X nodes go
>> down and Juju needs to bring fresh instances back up I think you'll run
> into
>> issues at the level of ZK implementation itself.
>
> We have a routine for removing nodes in the Juju charm. Juju won't re-use
> ids by default, and I think that it will automatically hit the
> "decrease_quorum" routine when a node goes down.

Can you please point me (link) to a code base so we can be sure we're talking
about the same thing? I'd like to see the implementation of this
"decrease_quorum" routine.

>> Definitely sounds reasonable going up (growing a cluster). But even in
> that
>> case, on the client side (e.g. anything like HBase or Giraph actually
> using
>> ZK as a coordination service) you'll be stuck with a stale list of ZK
> ensemble.
>> Not sure how that can be helped.
>
> The charm currently does the following on each zk node whenever a node is
> added, or goes away:
>
> * Reads out the current list of nodes from the config (currently zoo.cfg,
> though I have a TODO to make it just read stuff out of that ensemble value,
> instead).
> * Adds or removes the node in question from the list.
> * Writes out the new list to ensemble.
> * Re-runs puppet.
>
> Does that sound like correct behavior to you?

That's actually where the dragons may be. I think anything prior to 3.5 release
(which IIRC is still considered alpha after being out for more than
1.5 years ;-))
required a lot of care when doing that type of a manual rolling restart:
   
http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeeper-to-dynamically-add-servers-to-the-ensemble/
is a good summary of how precisely the orchestration of steps had to be done
in order to change the composition of the cluster.

I'm not saying that your Charms code doesn't do it exactly that way,
all I'm saying
is that ZK prior to 3.5 is SUPER brittle in this area.

> If the other services keep
> their own list of nodes via ensemble, then we'd need to add handlers on
> each of those services that modify ensemble, when Zookeeper notifies them
> that it has changed ...

But that's my point. Prior to 3.5 release you will have to signal out
of bound to
restart the client AND only after you confirmed a full rolling restart
of the cluster
AND probably after the new nodes have caught up.

Now, its actually not the end of the world for clients not to know the
full set of ensemble.
Its just that if the nodes they know about all go down -- there will
be no recourse.

Thanks,
Roman.

Re: Two Zookeeper Questions

Reply via email to