[Hadoop Wiki] Update of "ZooKeeper/FAQ" by ChangSong

Apache Wiki Thu, 04 Nov 2010 22:14:33 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "ZooKeeper/FAQ" page has been changed by ChangSong.
The comment on this change is: Added section 8. Can I run an ensemble cluster 
behind a load balancer?.
http://wiki.apache.org/hadoop/ZooKeeper/FAQ?action=diff&rev1=13&rev2=14

--------------------------------------------------

  
  See [[http://bit.ly/4ekN8G|this page]] for a survey Patrick Hunt 
(http://twitter.com/phunt) did looking at operational latency with both 
standalone server and an ensemble of size 3. You'll notice that a single core 
machine running a standalone ZK ensemble (1 server) is still able to process 
15k requests per second. This is orders of magnitude greater than what most 
applications require (if they are using ZooKeeper correctly - ie as a 
coordination service, and not as a replacement for a database, filestore, 
cache, etc...)
  
+ 
+ <<BR>>
+ <<Anchor(8)>>
+ '''8. [[#8|Can I run an ensemble cluster behind a load balancer?]]'''
+ 
+ 
+ There are two types of servers failures in distributed system from socket I/O 
perspective.
+ 
+  1. server down due to hardware failures and OS panic/hang, Zookeeper daemon 
hang, temporary/permanent network outage, network switch anomaly, etc: client 
cannot figure out failures immediately since there is no responding entities. 
As a result, zookeeper clients must rely on timeout to identify failures.
+ 
+  1. Dead zookeeper process (daemon): since OS will respond to closed TCP 
port, client will get "connection refused" upon socket connect or "peer reset" 
on socket I/O. Client immediately notice that the other end failed.
+ 
+ 
+ Here's how ZK clients respond to servers in each case.
+ 
+  1. In this case (former), ZK client rely on heartbeat algorithm. ZK clients 
detects server failures in 2/3 of recv timeout (Zookeeper_init), and then it 
retries the same IP at every recv timeout period if only one of ensemble is 
given. If more than two ensemble IP are given, ZK clients will try next IP 
immediately.
+ 
+  1. In this scenario, ZK client will immediately detect failure, and will 
retry connecting every second assuming only one ensemble IP is given. If 
multiple ensemble IP is given (most installation falls into this category), ZK 
client retries next IP immediately.
+ 
+ Notice that in both cases, when more than one ensemble IP is specified, ZK 
clients retry next IP immediately with no delay.
+ 
+ On some installations, it is preferable to run an ensemble cluster behind a 
load balancer such as hardware L4 switch, TCP reverse proxy, or DNS round-robin 
because such setup allows users to simply use one hostname or IP (or VIP) for 
ensemble cluster, and some detects server failures as well.
+ 
+ But there are subtle differences on how these load balancers will react upon 
server failures.
+ 
+  * Hardware L4 load balancer: this setup involves one IP and a hostname. L4 
switch usually does heartbeat on its own, and thus removes non-responding host 
from its IP list. But this also relies on the same timeout scheme for fault 
detection. L4 may redirect you to a unresponsive server. If hardware LB detect 
server failures fast enough, this setup will always redirect you to live 
ensemble server. 
+ 
+  * DNS round robin: this setup involves one hostname and a list of IPs. ZK 
clients correctly make used of a list of IPs returned by DNS query. Thus this 
setup works the same way as multiple hostname (IP) argument to zookeeper_init. 
The drawback is that when an ensemble cluster configuration changes like server 
addition/removal, it may take a while to propagate the DNS entry change in all 
DNS servers and DNS client caching (nscd for example) TTL issue.
+ 
+ In conclusion, DNS RR works as good as a list of ensemble IP arguments except 
cluster reconfiguration case.
+

[Hadoop Wiki] Update of "ZooKeeper/FAQ" by ChangSong

Reply via email to