Thanks everyone! Lots of interesting answers. On Fri, Nov 5, 2010 at 6:08 AM, Patrick Hunt <ph...@apache.org> wrote:
> Great, thanks! > > On Thu, Nov 4, 2010 at 10:04 PM, Chang Song <tru64...@me.com> wrote: > > > > Benjamin. > > It looks like ZK clients can handle a list of IPs from DNS query > correctly. > > Yes you are right. > > > > I am updating wiki per Patrick's request. > > > > Thanks a lot. > > > > Chang > > > > > > > > On Nov 5, 2010, at 1:10 AM, Benjamin Reed wrote: > > > >> one thing to note: the if you are using a DNS load balancer, some load > balancers will return the list of resolved addresses in different orders to > do the balancing. the zookeeper client will shuffle that list before it it > used, so in reality, using a single DNS hostname resolving to all the server > addresses will probably work just as well as most DNS-based load balancers. > >> > >> ben > >> > >> On 11/04/2010 08:26 AM, Patrick Hunt wrote: > >>> Hi Chang, thanks for the insights, if you have a few minutes would you > >>> mind updating the FAQ with some of this detail? > >>> http://wiki.apache.org/hadoop/ZooKeeper/FAQ > >>> > >>> Thanks! > >>> > >>> Patrick > >>> > >>> On Thu, Nov 4, 2010 at 6:27 AM, Chang Song<tru64...@me.com> wrote: > >>>> Sorry. I made a mistake on retry timeout in load balancer section of > my answer. > >>>> The same timeout applies to load balancer case as well (depends on the > recv > >>>> timeout) > >>>> > >>>> Thank you > >>>> > >>>> Chang > >>>> > >>>> > >>>> On Nov 4, 2010, at 10:22 PM, Chang Song wrote: > >>>> > >>>>> I would like to add some info on this. > >>>>> > >>>>> This may not be very important, but there are subtle differences. > >>>>> > >>>>> Two cases: 1. server hardware failure or kernel panic > >>>>> 2. zookeeper Java daemon process down > >>>>> > >>>>> In former one, timeout will be based on the timeout argument in > zookeeper_init(). > >>>>> Partially based on ZK heartbeat algorithm. It recognize server down > in 2/3 of the timeout. > >>>>> then retries at every timeout. For example, if timeout is 9000 msec, > it > >>>>> first times out in 6 second, and retries every 9 seconds. > >>>>> > >>>>> In latter case (Java process down), since socket connect immediately > returns > >>>>> refused connection, it can retry immediately. > >>>>> > >>>>> On top of that, > >>>>> > >>>>> - Hardware load balancer: > >>>>> If an ensemble cluster is serviced with hardware load balancer, > >>>>> zookeeper client will retry every 2 second since we only have one IP > to try. > >>>>> > >>>>> - DNS RR: > >>>>> Make sure that "nscd" on your linux box is off since it is most > likely that DNS cache returns the same IP many times. > >>>>> This is actually worse than above since ZK client will retry the same > dead server every 2 seconds for some time. > >>>>> > >>>>> > >>>>> I think it is best not to use load balancer for ZK clients since ZK > clients will try next server immediately > >>>>> if previous one fails for some reason (based on timeout above). And > this is especially true if your cluster works in > >>>>> pseudo realtime environment where tickTime is set to very low. > >>>>> > >>>>> > >>>>> Chang > >>>>> > >>>>> > >>>>> On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote: > >>>>> > >>>>>> DNS round-robin works as well. > >>>>>> > >>>>>> On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed<br...@yahoo-inc.com> > wrote: > >>>>>> > >>>>>>> it would have to be a TCP based load balancer to work with > ZooKeeper > >>>>>>> clients, but other than that it should work really well. The > clients will be > >>>>>>> doing heart beats so the TCP connections will be long lived. The > client > >>>>>>> library does random connection load balancing anyway. > >>>>>>> > >>>>>>> ben > >>>>>>> > >>>>>>> On 11/03/2010 12:19 PM, Luka Stojanovic wrote: > >>>>>>> > >>>>>>>> What would be expected behavior if a three node cluster is put > behind a > >>>>>>>> load > >>>>>>>> balancer? It would ease deployment because all clients would be > configured > >>>>>>>> to target zookeeper.example.com regardless of actual cluster > >>>>>>>> configuration, > >>>>>>>> but I have impression that client-server connection is stateful > and that > >>>>>>>> jumping randomly from server to server could bring strange > behavior. > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Luka Stojanovic > >>>>>>>> lu...@vast.com > >>>>>>>> Platform Engineering > >>>>>>>> > >>>>>>> > >>>> > >> > > > > >