hi I have been using zookeeper and curator for some time for our server deployments running in AWS. This has helped a lot in our day to day operations as most server failures are now handled automatically.
One of the issues I have been looking into, is how to recover from when one of the zookeeper-server instances goes bad, and needs replacement. That is now almost all nicely handled, but I am left with one problem: how can the zookeeper-client-servers connect to the newly created zookeeper-server instance? I have not been able to make that work without changing the zookeeper code a little bit. The current setup works roughly as follows: I am using 5 elastic-ips for the 5 server cluster, and the zoo.cfg lists those ip-addresses directly - there is no dns-lookup needed by the zookeeper-servers. This way, when I replace a zookeeper server, I need to reassign that elastic-ip to the new instance and the zookeeper-cluster grows back to 5 servers. Works great- and is actually automated already. The zookeeper-cluster is now a self-healing ASG with some little extra scripting deployed inside the image. For the connections of our hundreds of application servers to the zookeeper cluster, I have setup the connectString with logical hostnames, like: zookeeperX.applicationname.com<http://zookeeperX.applicationname.com> which is mapped via a (static) CNAME record to AWS’s typical public hostnames like: ec2-A-B-C-D.compute-1.amazonaws.com<http://ec2-A-B-C-D.compute-1.amazonaws.com> where A.B.C.D is the elastic IP. The reason for using the hostname and not directly the elastic-ip address in the connectstring is so that AWS can correctly resolve the security groups: if I use the IP addresses, I will need to provision each of the applications server IPs to the zookeeper-cluster security group, which is really annoying as I have hundreds of them, and they can change from day to day or week to week. If I use the ec2-A-B-C-D.-name I only need to add the application-servers security-groups once, and any application-servers launched in those groups can connect to eg. zookeeper1.applicationname.com<http://zookeeper1.applicationname.com>:2181 as it maps to the ec2-name, which maps to the current mapped private IP address of the server, while maintaining the source security group. Great so far. If I now terminate a zookeeper-server instance, the ASG will bring a new one up, the scripting inside those will re-assign the elastic-ip and that server joins the zookeeper-cluster. A few minutes later, the dnslookup of that servers zookeeperX.applicationname.com<http://zookeeperX.applicationname.com> name will change to the new private IP. So far so good. But: as the application servers will not be able to reconnect to the replacement instance, as the private-ip of it has changed, and as the the curator+zookeeper-client code uses the StaticHostProvider, the name to IP relation is looked up only once, and will not change until a restart. I don’t want to restart our curator+zookeeper-clients as it causes downtime for the application. I want the sessions to continue, so leaderlatches remain owned by their current owners. My proposed solution requires a rather trivial change to the ZooKeeper class: add constructors which allow passing in any user-provided implementation of HostProvider, and change the member now defined as "private final StaticHostProvider hostProvider" to be of type HostProvider. With those changes, anyone who wants, can provide their own implementation of the already defined interface HostProvider and pass that to the ZooKeeper class. In practice I create a subclass of ZooKeeper which also adds some extra metrics and jmx options specific to our setup. It all integrates just fine with curator. I have been testing with that setup, building a simple LateResolvingHostProvider which does the name to ip lookup when next() is called, and thus it can readily connect to the replacement server instance once the dnslookup result has changed - which is typically about 5 minutes after the replacement. Using a bit of automation from AWS - autoscalinggroups and a simple script inside the images I use for zookeeper, I can now terminate any single zookeeper-server instance, and automatically the replacement comes back, gets the elastic-ip, joins the other zookeeper-servers, syncs up, and joins the zookeeper-cluster. The application servers have already moved to the next() server by that time, but if needed later on, they will be able to reconnect to that replacement server. Just as a concept, I also can trigger all the application servers to drop their current connections and reconnect to the first name in their (shuffled) list of zookeeper-servers - without loosing any sessions, locks, leaders, etc - in effect rebalancing the connections across the servers. Sorry for the lengthy description, but I hope it clarifies a bit why I would like that change to be considered. I actually submitted a patch already (https://issues.apache.org/jira/browse/ZOOKEEPER-2107) , but probably did not follow the right steps as a first-timer - so I am now retrying that using the guidelines at http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute :) kind regards, Robert Kamphuis
