hi

I have been using zookeeper and curator for some time for our server 
deployments running in AWS. This has helped a lot in our day to day operations 
as most server failures are now handled automatically.

One of the issues I have been looking into, is how to recover from when one of 
the zookeeper-server instances goes bad, and needs replacement. That is now 
almost all nicely handled, but I am left with one problem: how can the 
zookeeper-client-servers connect to the newly created zookeeper-server 
instance? I have not been able to make that work without changing the zookeeper 
code a little bit.

The current setup works roughly as follows:  I am using 5 elastic-ips for the 5 
server cluster, and the zoo.cfg lists those ip-addresses directly - there is no 
dns-lookup needed by the zookeeper-servers. This way, when I replace a 
zookeeper server, I need to reassign that elastic-ip to the new instance and 
the zookeeper-cluster grows back to 5 servers. Works great- and is actually 
automated already. The zookeeper-cluster is now a self-healing ASG with some 
little extra scripting deployed inside the image. For the connections of our 
hundreds of application servers to the zookeeper cluster, I have setup the 
connectString with logical hostnames, like: 
zookeeperX.applicationname.com<http://zookeeperX.applicationname.com> which is 
mapped via a (static) CNAME record to AWS’s typical public hostnames like: 
ec2-A-B-C-D.compute-1.amazonaws.com<http://ec2-A-B-C-D.compute-1.amazonaws.com> 
where A.B.C.D is the elastic IP. The reason for using the hostname and not 
directly the elastic-ip address in the connectstring is so that AWS can 
correctly resolve the security groups: if I use the IP addresses, I will need 
to provision each of the applications server IPs to the zookeeper-cluster 
security group, which is really annoying as I have hundreds of them, and they 
can change from day to day or week to week. If I use the ec2-A-B-C-D.-name I 
only need to add the application-servers security-groups once, and any 
application-servers launched in those groups can connect to eg. 
zookeeper1.applicationname.com<http://zookeeper1.applicationname.com>:2181 as 
it maps to the ec2-name, which maps to the current mapped private IP address of 
the server, while maintaining the source security group. Great so far. If I now 
terminate a zookeeper-server instance, the ASG will bring a new one up, the 
scripting inside those will re-assign the elastic-ip and that server joins the 
zookeeper-cluster. A few minutes later, the dnslookup of that servers 
zookeeperX.applicationname.com<http://zookeeperX.applicationname.com> name will 
change to the new private IP. So far so good.

But: as the application servers will not be able to reconnect to the 
replacement instance, as the private-ip of it has changed, and as the the 
curator+zookeeper-client code uses the StaticHostProvider, the name to IP 
relation is looked up only once, and will not change until a restart. I don’t 
want to restart our curator+zookeeper-clients as it causes downtime for the 
application. I want the sessions to continue, so leaderlatches remain owned by 
their current owners.

My proposed solution requires a rather trivial change to the ZooKeeper class: 
add constructors which allow passing in any user-provided implementation of 
HostProvider, and change the member now defined as "private final 
StaticHostProvider hostProvider" to be of type HostProvider. With those 
changes, anyone who wants, can provide their own implementation of the already 
defined interface HostProvider and pass that to the ZooKeeper class. In 
practice I create a subclass of ZooKeeper which also adds some extra metrics 
and jmx options specific to our setup. It all integrates just fine with 
curator. I have been testing with that setup, building a simple 
LateResolvingHostProvider which does the name to ip lookup when next() is 
called, and thus it can readily connect to the replacement server instance once 
the dnslookup result has changed - which is typically about 5 minutes after the 
replacement.

Using a bit of automation from AWS - autoscalinggroups and a simple script 
inside the images I use for zookeeper, I can now terminate any single 
zookeeper-server instance, and automatically the replacement comes back, gets 
the elastic-ip, joins the other zookeeper-servers, syncs up, and joins the 
zookeeper-cluster. The application servers have already moved to the next() 
server by that time, but if needed later on, they will be able to reconnect to 
that replacement server. Just as a concept, I also can trigger all the 
application servers to drop their current connections and reconnect to the 
first name in their (shuffled) list of zookeeper-servers - without loosing any 
sessions, locks, leaders, etc - in effect rebalancing the connections across 
the servers.

Sorry for the lengthy description, but I hope it clarifies a bit why I would 
like that change to be considered.

I actually submitted a patch already 
(https://issues.apache.org/jira/browse/ZOOKEEPER-2107) , but probably did not 
follow the right steps as a first-timer - so I am now retrying that using the 
guidelines at http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute :)

kind regards,
Robert Kamphuis




Reply via email to