Hi Roman, Welcome back! I hope that you had an excellent vacation :-)
> I think Juju can help solve it with scaling up, but not down (and not down-up). > IOW, adding new nodes to the cluster should be no problem. If X nodes go > down and Juju needs to bring fresh instances back up I think you'll run into > issues at the level of ZK implementation itself. We have a routine for removing nodes in the Juju charm. Juju won't re-use ids by default, and I think that it will automatically hit the "decrease_quorum" routine when a node goes down. I will make a note to double check its behavior, and confirm w/ the [email protected] list that we're doing the right thing. > Definitely sounds reasonable going up (growing a cluster). But even in that > case, on the client side (e.g. anything like HBase or Giraph actually using > ZK as a coordination service) you'll be stuck with a stale list of ZK ensemble. > Not sure how that can be helped. The charm currently does the following on each zk node whenever a node is added, or goes away: * Reads out the current list of nodes from the config (currently zoo.cfg, though I have a TODO to make it just read stuff out of that ensemble value, instead). * Adds or removes the node in question from the list. * Writes out the new list to ensemble. * Re-runs puppet. Does that sound like correct behavior to you? If the other services keep their own list of nodes via ensemble, then we'd need to add handlers on each of those services that modify ensemble, when Zookeeper notifies them that it has changed ... Thank you, ~ PeteVG On Mon, Jun 13, 2016 at 9:45 AM MrAsanjar . <[email protected]> wrote: > *Roman,* > *If zookeeper charm has been developed correctly, as PEERS in > methadata.yaml, everytime* > *any member of zookeeper quorum terminates, a zookeeper-quorum-departed > event will be send to all* > *other services (i.e. HBase).* > > ``` > tags: ["bigdata", "hadoop", "apache"] > > provides: > zookeeper: > interface: zookeeperpeers: > quorum: > interface: zookeeper-quorum > > ``` > > > On Sat, Jun 11, 2016 at 10:21 PM, Roman Shaposhnik <[email protected]> > wrote: > > > Sorry for a belated reply -- was traveling and only now catching up > > with the email. > > > > On Fri, Jun 3, 2016 at 4:00 PM, Pete VanderGiessen > > <[email protected]> wrote: > > > Hi All, > > > > > > I created a JIRA ticket. Apologies in advance if I've committed any > sins > > > categorizing and tagging it (would be happy to fix them if so): > > > https://issues.apache.org/jira/browse/BIGTOP-2467 > > > > > > @Roman: hi. Thank you for the detailed reply :-) > > > > > > Theoretically, you can add nodes to the ZK cluster (provided that you > > >> increment > > >> the identity ID) but if the node goes down you have to maintain the > > >> identity > > >> mapping somehow. Which leads to a situation where most ZK cluster are > > very > > >> static in nature. You only bring nodes back from the dead -- you don't > > >> really > > >> scale cluster up and down. > > >> > > > > > > I think that the juju charm I'm working on can help solve some of the > > > technical issues with scaling up and down. > > > > I think Juju can help solve it with scaling up, but not down (and not > > down-up). > > IOW, adding new nodes to the cluster should be no problem. If X nodes go > > down and Juju needs to bring fresh instances back up I think you'll run > > into > > issues at the level of ZK implementation itself. Quick aside: to get > > an authoritative > > answer I think this part of the discussion should really go to > > [email protected]. > > > > > We have a mapping of unique ids > > > in the form of the unit id that we track for each node, and handlers > for > > > the cases where nodes come up or go down. What I'd like to do is change > > the > > > list the puppet script expects to find in its ensemble variable from > > this: > > > > > > ["<some ip>:2888:3888", "<some other ip>:2888:3888"] > > > > > > To this: > > > > > > [[<some id>, "<some ip>:2888:3888"], [<some other id>, "<some other > > > ip>:2888:3888"]] > > > > > > I think that helps you out in cases where you're not using a charm, > too, > > as > > > you can determine your ids and ips in advance, and then pass a > correctly > > > ordered and identified list to puppet on each node. > > > > Definitely sounds reasonable going up (growing a cluster). But even in > that > > case, on the client side (e.g. anything like HBase or Giraph actually > using > > ZK as a coordination service) you'll be stuck with a stale list of ZK > > ensemble. > > Not sure how that can be helped. > > > > Thanks, > > Roman. > > >
