> Dragons are interesting. I will give the link a read through, and think about what I need to do in order to validate that our charm plays nicely with them. :-)
Just a quick follow up to this: After some internal discussion, we are going to switch the charm over to prompting the user to manually restart the Zookeeper nodes after quorum changes. The charm can still handle the configuration aspects, but it will be up to a human operator to do the rolling restart and check to make sure that everything is still working well. We do have a layer that facilitates rolling restarts, so adding that in is a possible feature down the line when/if we can figure out how to implement checks to make sure that the restart didn't break anything. ~ PeteVG On Mon, Jun 13, 2016 at 4:48 PM Pete Vander Giessen < [email protected]> wrote: > Hi Roman, > > The code in question lives here (reactive handlers): > > https://github.com/juju-solutions/bigtop/blob/zookeeper/bigtop-packages/src/charm/zookeeper/layer-zookeeper/reactive/zookeeper.py > > And here (lib class that actually does the work): > > https://github.com/juju-solutions/bigtop/blob/zookeeper/bigtop-packages/src/charm/zookeeper/layer-zookeeper/lib/charms/layer/zookeeper.py > > > That's actually where the dragons may be. > > Dragons are interesting. I will give the link a read through, and think > about what I need to do in order to validate that our charm plays nicely > with them. :-) > > On Mon, Jun 13, 2016 at 4:35 PM Roman Shaposhnik <[email protected]> > wrote: > >> On Mon, Jun 13, 2016 at 1:01 PM, Pete Vander Giessen >> <[email protected]> wrote: >> > Hi Roman, >> > >> > Welcome back! I hope that you had an excellent vacation :-) >> >> Thanks! It was a pretty awesome track trough Fränkische Schweiz. >> >> >> I think Juju can help solve it with scaling up, but not down (and not >> > down-up). >> >> IOW, adding new nodes to the cluster should be no problem. If X nodes >> go >> >> down and Juju needs to bring fresh instances back up I think you'll run >> > into >> >> issues at the level of ZK implementation itself. >> > >> > We have a routine for removing nodes in the Juju charm. Juju won't >> re-use >> > ids by default, and I think that it will automatically hit the >> > "decrease_quorum" routine when a node goes down. >> >> Can you please point me (link) to a code base so we can be sure we're >> talking >> about the same thing? I'd like to see the implementation of this >> "decrease_quorum" routine. >> >> >> Definitely sounds reasonable going up (growing a cluster). But even in >> > that >> >> case, on the client side (e.g. anything like HBase or Giraph actually >> > using >> >> ZK as a coordination service) you'll be stuck with a stale list of ZK >> > ensemble. >> >> Not sure how that can be helped. >> > >> > The charm currently does the following on each zk node whenever a node >> is >> > added, or goes away: >> > >> > * Reads out the current list of nodes from the config (currently >> zoo.cfg, >> > though I have a TODO to make it just read stuff out of that ensemble >> value, >> > instead). >> > * Adds or removes the node in question from the list. >> > * Writes out the new list to ensemble. >> > * Re-runs puppet. >> > >> > Does that sound like correct behavior to you? >> >> That's actually where the dragons may be. I think anything prior to 3.5 >> release >> (which IIRC is still considered alpha after being out for more than >> 1.5 years ;-)) >> required a lot of care when doing that type of a manual rolling restart: >> >> http://www.benhallbenhall.com/2011/07/rolling-restart-in-apache-zookeeper-to-dynamically-add-servers-to-the-ensemble/ >> is a good summary of how precisely the orchestration of steps had to be >> done >> in order to change the composition of the cluster. >> >> I'm not saying that your Charms code doesn't do it exactly that way, >> all I'm saying >> is that ZK prior to 3.5 is SUPER brittle in this area. >> >> > If the other services keep >> > their own list of nodes via ensemble, then we'd need to add handlers on >> > each of those services that modify ensemble, when Zookeeper notifies >> them >> > that it has changed ... >> >> But that's my point. Prior to 3.5 release you will have to signal out >> of bound to >> restart the client AND only after you confirmed a full rolling restart >> of the cluster >> AND probably after the new nodes have caught up. >> >> Now, its actually not the end of the world for clients not to know the >> full set of ensemble. >> Its just that if the nodes they know about all go down -- there will >> be no recourse. >> >> Thanks, >> Roman. >> >
