Here is an interesting section from ZooKeeper's wiki: http://wiki.apache.org/hadoop/ZooKeeper/FAQ
My current thinking is that we will be leveraging a ZK ensemble cluster as part of the Knox HA cluster deployment. We should probably align as closely with this as possible for usability reasons. *8. Can I run an ensemble cluster behind a load balancer? <http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A8>* There are two types of servers failures in distributed system from socket I/O perspective. 1. server down due to hardware failures and OS panic/hang, Zookeeper daemon hang, temporary/permanent network outage, network switch anomaly, etc: client cannot figure out failures immediately since there is no responding entities. As a result, zookeeper clients must rely on timeout to identify failures. 2. Dead zookeeper process (daemon): since OS will respond to closed TCP port, client will get "connection refused" upon socket connect or "peer reset" on socket I/O. Client immediately notice that the other end failed. Here's how ZK clients respond to servers in each case. 1. In this case (former), ZK client rely on heartbeat algorithm. ZK clients detects server failures in 2/3 of recv timeout (Zookeeper_init), and then it retries the same IP at every recv timeout period if only one of ensemble is given. If more than two ensemble IP are given, ZK clients will try next IP immediately. 2. In this scenario, ZK client will immediately detect failure, and will retry connecting every second assuming only one ensemble IP is given. If multiple ensemble IP is given (most installation falls into this category), ZK client retries next IP immediately. Notice that in both cases, when more than one ensemble IP is specified, ZK clients retry next IP immediately with no delay. On some installations, it is preferable to run an ensemble cluster behind a load balancer such as hardware L4 switch, TCP reverse proxy, or DNS round-robin because such setup allows users to simply use one hostname or IP (or VIP) for ensemble cluster, and some detects server failures as well. But there are subtle differences on how these load balancers will react upon server failures. - Hardware L4 load balancer: this setup involves one IP and a hostname. L4 switch usually does heartbeat on its own, and thus removes non-responding host from its IP list. But this also relies on the same timeout scheme for fault detection. L4 may redirect you to a unresponsive server. If hardware LB detect server failures fast enough, this setup will always redirect you to live ensemble server. - DNS round robin: this setup involves one hostname and a list of IPs. ZK clients correctly make used of a list of IPs returned by DNS query. Thus this setup works the same way as multiple hostname (IP) argument to zookeeper_init. The drawback is that when an ensemble cluster configuration changes like server addition/removal, it may take a while to propagate the DNS entry change in all DNS servers and DNS client caching (nscd for example) TTL issue. In conclusion, DNS RR works as good as a list of ensemble IP arguments except cluster reconfiguration case. It turns out that there is a minor problem with DNS RR. If you are using a tool such as zktop.py, it does not take care of a list of host IP returned by a DNS server. On Tue, Dec 24, 2013 at 2:29 PM, larry mccay <[email protected]> wrote: > Thank you, Maksim - this is well presented. > > Have you already posted or plan to post a similar description of using > haproxy for Knox? > > > > On Tue, Dec 24, 2013 at 2:07 PM, Maksim Kononenko < > [email protected]> wrote: > >> Hi all! >> >> Here are results of "simulating" DNS Round Robin. >> >> For this purpose I used BIND DNS Server. >> Here are links how to install/configure it: >> >> https://www.digitalocean.com/community/articles/how-to-install-the-bind-dns-server-on-centos-6 >> http://www.centos.org/docs/2/rhl-rg-en-7.2/s1-bind-configuration.html >> http://www.centos.org/docs/4/html/rhel-rg-en-4/s1-bind-zone.html >> >> Also here are steps that I executed to configure BIND Server. It is >> expected that BIND Sever has been installed successfully: >> 1. Configure "/etc/named.conf" file. I used configuration template from >> links listed above: >> >> options { >> #listen-on port 53 { 127.0.0.1; }; >> listen-on-v6 port 53 { ::1; }; >> directory "/var/named"; >> dump-file "/var/named/data/cache_dump.db"; >> statistics-file "/var/named/data/named_stats.txt"; >> memstatistics-file "/var/named/data/named_mem_stats.txt"; >> allow-query { any; }; >> allow-transfer { localhost; }; >> recursion no; >> >> dnssec-enable yes; >> dnssec-validation yes; >> dnssec-lookaside auto; >> >> /* Path to ISC DLV key */ >> bindkeys-file "/etc/named.iscdlv.key"; >> >> managed-keys-directory "/var/named/dynamic"; >> }; >> >> logging { >> channel default_debug { >> file "data/named.run"; >> severity dynamic; >> }; >> }; >> >> zone "." IN { >> type hint; >> file "named.ca"; >> }; >> >> zone "mydomain.com" IN { >> type master; >> file "mydomain.com.zone"; >> allow-update { none; }; >> }; >> >> include "/etc/named.rfc1912.zones"; >> include "/etc/named.root.key"; >> >> >> >> 2. Configure "/var/named/mydomain.com.zone" file. I used configuration >> template from links listed above: >> >> $TTL 86400 >> @ IN SOA ns1.mydomain.com. root.mydomain.com. ( >> 2013042201 ;Serial >> 3600 ;Refresh >> 1800 ;Retry >> 604800 ;Expire >> 86400 ;Minimum TTL >> ) >> ; Specify our two nameservers >> IN NS ns1.mydomain.com. >> ; Resolve nameserver hostnames to IP, replace with your two droplet IP >> addresses. >> ns1 IN A 192.168.56.104 >> >> ; Define hostname -> IP pairs which you wish to resolve >> knox IN A 192.168.56.105 >> knox IN A 192.168.56.106 >> >> >> Here: >> 192.168.56.104 - it is nameserver IP. BIND is installed here. >> 192.168.56.105 and 192.168.56.106 - hosts where Knox istances are >> installed. >> >> I used "mydomain.com" as domain name, so to talk to Knox I use " >> knox.mydomain.com" host name. >> >> >> 3. On host with tests I added to "/etc/resolv.conf" additional line: >> nameserver 192.168.56.104 >> >> It points to host where BIND DNS Server is installed. Now client host can >> talk to BIND Server to resolve "knox.mydomain.com" host name. >> >> >> >> >> Now here is some description of how DNS RR works (all what I managed to >> find and understand :) ): >> 1. DNS Server responses to client's request with all possible IPs. >> IPs order is changed every time - this way DNS server makes RR. >> To check it, use "dig knox.mydomain.com" command. In my case it contains: >> >> ;; ANSWER SECTION: >> knox.mydomain.com. 86400 IN A 192.168.56.105 >> knox.mydomain.com. 86400 IN A 192.168.56.106 >> >> Next time I got: >> >> ;; ANSWER SECTION: >> knox.mydomain.com. 86400 IN A 192.168.56.106 >> knox.mydomain.com. 86400 IN A 192.168.56.105 >> >> Client is then responsible for selecting IP address/failover. >> >> 2. DNS RR is not required to support session stickiness. >> >> 3. Resolved IPs are cached in different layers and caching can be >> configured. For clear picture, please read >> http://en.wikipedia.org/wiki/DNS#Record_caching. Also you can read >> http://en.wikipedia.org/wiki/DNS#Client_lookup. >> >> 4. I ran some tests using FF browser, Knox samples, CURL, ping command. >> >> They gave me following results: >> - these clients internally select IP from the received list. >> - these clients can provide session stickiness; >> - these clients can provide failover; >> >> >> So, I can conclude that HAProxy takes all resposibility for hiding proxied >> instances, load balancing, session stickiness, failover, whereas DNS RR >> just provides a list of all hosts (which is different every time) and >> delegates all resposibility to clients. >> >> >> Maksim. >> >> -- >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to >> which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified >> that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender >> immediately >> and delete it from your system. Thank You. >> > >
