Benjamin. It looks like ZK clients can handle a list of IPs from DNS query correctly. Yes you are right.
I am updating wiki per Patrick's request. Thanks a lot. Chang On Nov 5, 2010, at 1:10 AM, Benjamin Reed wrote: > one thing to note: the if you are using a DNS load balancer, some load > balancers will return the list of resolved addresses in different orders to > do the balancing. the zookeeper client will shuffle that list before it it > used, so in reality, using a single DNS hostname resolving to all the server > addresses will probably work just as well as most DNS-based load balancers. > > ben > > On 11/04/2010 08:26 AM, Patrick Hunt wrote: >> Hi Chang, thanks for the insights, if you have a few minutes would you >> mind updating the FAQ with some of this detail? >> http://wiki.apache.org/hadoop/ZooKeeper/FAQ >> >> Thanks! >> >> Patrick >> >> On Thu, Nov 4, 2010 at 6:27 AM, Chang Song<tru64...@me.com> wrote: >>> Sorry. I made a mistake on retry timeout in load balancer section of my >>> answer. >>> The same timeout applies to load balancer case as well (depends on the recv >>> timeout) >>> >>> Thank you >>> >>> Chang >>> >>> >>> On Nov 4, 2010, at 10:22 PM, Chang Song wrote: >>> >>>> I would like to add some info on this. >>>> >>>> This may not be very important, but there are subtle differences. >>>> >>>> Two cases: 1. server hardware failure or kernel panic >>>> 2. zookeeper Java daemon process down >>>> >>>> In former one, timeout will be based on the timeout argument in >>>> zookeeper_init(). >>>> Partially based on ZK heartbeat algorithm. It recognize server down in 2/3 >>>> of the timeout. >>>> then retries at every timeout. For example, if timeout is 9000 msec, it >>>> first times out in 6 second, and retries every 9 seconds. >>>> >>>> In latter case (Java process down), since socket connect immediately >>>> returns >>>> refused connection, it can retry immediately. >>>> >>>> On top of that, >>>> >>>> - Hardware load balancer: >>>> If an ensemble cluster is serviced with hardware load balancer, >>>> zookeeper client will retry every 2 second since we only have one IP to >>>> try. >>>> >>>> - DNS RR: >>>> Make sure that "nscd" on your linux box is off since it is most likely >>>> that DNS cache returns the same IP many times. >>>> This is actually worse than above since ZK client will retry the same dead >>>> server every 2 seconds for some time. >>>> >>>> >>>> I think it is best not to use load balancer for ZK clients since ZK >>>> clients will try next server immediately >>>> if previous one fails for some reason (based on timeout above). And this >>>> is especially true if your cluster works in >>>> pseudo realtime environment where tickTime is set to very low. >>>> >>>> >>>> Chang >>>> >>>> >>>> On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote: >>>> >>>>> DNS round-robin works as well. >>>>> >>>>> On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed<br...@yahoo-inc.com> wrote: >>>>> >>>>>> it would have to be a TCP based load balancer to work with ZooKeeper >>>>>> clients, but other than that it should work really well. The clients >>>>>> will be >>>>>> doing heart beats so the TCP connections will be long lived. The client >>>>>> library does random connection load balancing anyway. >>>>>> >>>>>> ben >>>>>> >>>>>> On 11/03/2010 12:19 PM, Luka Stojanovic wrote: >>>>>> >>>>>>> What would be expected behavior if a three node cluster is put behind a >>>>>>> load >>>>>>> balancer? It would ease deployment because all clients would be >>>>>>> configured >>>>>>> to target zookeeper.example.com regardless of actual cluster >>>>>>> configuration, >>>>>>> but I have impression that client-server connection is stateful and that >>>>>>> jumping randomly from server to server could bring strange behavior. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> -- >>>>>>> Luka Stojanovic >>>>>>> lu...@vast.com >>>>>>> Platform Engineering >>>>>>> >>>>>> >>> >