Re: Running cluster behind load balancer

Chang Song Thu, 04 Nov 2010 22:05:31 -0700

Benjamin.
It looks like ZK clients can handle a list of IPs from DNS query correctly.
Yes you are right.


I am updating wiki per Patrick's request.

Thanks a lot.

Chang



On Nov 5, 2010, at 1:10 AM, Benjamin Reed wrote:

> one thing to note: the if you are using a DNS load balancer, some load 
> balancers will return the list of resolved addresses in different orders to 
> do the balancing. the zookeeper client will shuffle that list before it it 
> used, so in reality, using a single DNS hostname resolving to all the server 
> addresses will probably work just as well as most DNS-based load balancers.
> 
> ben
> 
> On 11/04/2010 08:26 AM, Patrick Hunt wrote:
>> Hi Chang, thanks for the insights, if you have a few minutes would you
>> mind updating the FAQ with some of this detail?
>> http://wiki.apache.org/hadoop/ZooKeeper/FAQ
>> 
>> Thanks!
>> 
>> Patrick
>> 
>> On Thu, Nov 4, 2010 at 6:27 AM, Chang Song<tru64...@me.com>  wrote:
>>> Sorry. I made a mistake on retry timeout in load balancer section of my 
>>> answer.
>>> The same timeout applies to load balancer case as well (depends on the recv
>>> timeout)
>>> 
>>> Thank you
>>> 
>>> Chang
>>> 
>>> 
>>> On Nov 4, 2010, at 10:22 PM, Chang Song wrote:
>>> 
>>>> I would like to add some info on this.
>>>> 
>>>> This may not be very important, but there are subtle differences.
>>>> 
>>>> Two cases:  1. server hardware failure or kernel panic
>>>>                      2. zookeeper Java daemon process down
>>>> 
>>>> In former one, timeout will be based on the timeout argument in 
>>>> zookeeper_init().
>>>> Partially based on ZK heartbeat algorithm. It recognize server down in 2/3 
>>>> of the timeout.
>>>> then retries at every timeout. For example, if timeout is 9000 msec, it
>>>> first times out in 6 second, and retries every 9 seconds.
>>>> 
>>>> In latter case (Java process down), since socket connect immediately 
>>>> returns
>>>> refused connection, it can retry immediately.
>>>> 
>>>> On top of that,
>>>> 
>>>> - Hardware load balancer:
>>>> If an ensemble cluster is serviced with hardware load balancer,
>>>> zookeeper client will retry every 2 second since we only have one IP to 
>>>> try.
>>>> 
>>>> - DNS RR:
>>>> Make sure that "nscd" on your linux box is off since it is most likely 
>>>> that DNS cache returns the same IP many times.
>>>> This is actually worse than above since ZK client will retry the same dead 
>>>> server every 2 seconds for some time.
>>>> 
>>>> 
>>>> I think it is best not to use load balancer for ZK clients since ZK 
>>>> clients will try next server immediately
>>>> if previous one fails for some reason (based on timeout above). And this 
>>>> is especially true if your cluster works in
>>>> pseudo realtime environment where tickTime is set to very low.
>>>> 
>>>> 
>>>> Chang
>>>> 
>>>> 
>>>> On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote:
>>>> 
>>>>> DNS round-robin works as well.
>>>>> 
>>>>> On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed<br...@yahoo-inc.com>  wrote:
>>>>> 
>>>>>> it would have to be a TCP based load balancer to work with ZooKeeper
>>>>>> clients, but other than that it should work really well. The clients 
>>>>>> will be
>>>>>> doing heart beats so the TCP connections will be long lived. The client
>>>>>> library does random connection load balancing anyway.
>>>>>> 
>>>>>> ben
>>>>>> 
>>>>>> On 11/03/2010 12:19 PM, Luka Stojanovic wrote:
>>>>>> 
>>>>>>> What would be expected behavior if a three node cluster is put behind a
>>>>>>> load
>>>>>>> balancer? It would ease deployment because all clients would be 
>>>>>>> configured
>>>>>>> to target zookeeper.example.com regardless of actual cluster
>>>>>>> configuration,
>>>>>>> but I have impression that client-server connection is stateful and that
>>>>>>> jumping randomly from server to server could bring strange behavior.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> --
>>>>>>> Luka Stojanovic
>>>>>>> lu...@vast.com
>>>>>>> Platform Engineering
>>>>>>> 
>>>>>> 
>>> 
>

Re: Running cluster behind load balancer

Reply via email to