Hi Roman, The code in question lives here (reactive handlers): https://github.com/juju-solutions/bigtop/blob/zookeeper/bigtop-packages/src/charm/zookeeper/layer-zookeeper/reactive/zookeeper.py
And here (lib class that actually does the work): https://github.com/juju-solutions/bigtop/blob/zookeeper/bigtop-packages/src/charm/zookeeper/layer-zookeeper/lib/charms/layer/zookeeper.py > That's actually where the dragons may be. Dragons are interesting. I will give the link a read through, and think about what I need to do in order to validate that our charm plays nicely with them. :-) On Mon, Jun 13, 2016 at 4:35 PM Roman Shaposhnik <[email protected]> wrote: > On Mon, Jun 13, 2016 at 1:01 PM, Pete Vander Giessen > <[email protected]> wrote: > > Hi Roman, > > > > Welcome back! I hope that you had an excellent vacation :-) > > Thanks! It was a pretty awesome track trough Fränkische Schweiz. > > >> I think Juju can help solve it with scaling up, but not down (and not > > down-up). > >> IOW, adding new nodes to the cluster should be no problem. If X nodes go > >> down and Juju needs to bring fresh instances back up I think you'll run > > into > >> issues at the level of ZK implementation itself. > > > > We have a routine for removing nodes in the Juju charm. Juju won't re-use > > ids by default, and I think that it will automatically hit the > > "decrease_quorum" routine when a node goes down. > > Can you please point me (link) to a code base so we can be sure we're > talking > about the same thing? I'd like to see the implementation of this > "decrease_quorum" routine. > > >> Definitely sounds reasonable going up (growing a cluster). But even in > > that > >> case, on the client side (e.g. anything like HBase or Giraph actually > > using > >> ZK as a coordination service) you'll be stuck with a stale list of ZK > > ensemble. > >> Not sure how that can be helped. > > > > The charm currently does the following on each zk node whenever a node is > > added, or goes away: > > > > * Reads out the current list of nodes from the config (currently zoo.cfg, > > though I have a TODO to make it just read stuff out of that ensemble > value, > > instead). > > * Adds or removes the node in question from the list. > > * Writes out the new list to ensemble. > > * Re-runs puppet. > > > > Does that sound like correct behavior to you? > > That's actually where the dragons may be. I think anything prior to 3.5 > release > (which IIRC is still considered alpha after being out for more than > 1.5 years ;-)) > required a lot of care when doing that type of a manual rolling restart: > > http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeeper-to-dynamically-add-servers-to-the-ensemble/ > is a good summary of how precisely the orchestration of steps had to be > done > in order to change the composition of the cluster. > > I'm not saying that your Charms code doesn't do it exactly that way, > all I'm saying > is that ZK prior to 3.5 is SUPER brittle in this area. > > > If the other services keep > > their own list of nodes via ensemble, then we'd need to add handlers on > > each of those services that modify ensemble, when Zookeeper notifies them > > that it has changed ... > > But that's my point. Prior to 3.5 release you will have to signal out > of bound to > restart the client AND only after you confirmed a full rolling restart > of the cluster > AND probably after the new nodes have caught up. > > Now, its actually not the end of the world for clients not to know the > full set of ensemble. > Its just that if the nodes they know about all go down -- there will > be no recourse. > > Thanks, > Roman. >
