FYI:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

On Tue, Oct 12, 2010 at 2:23 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

>  yes, your watcher objects will get the connectionloss event and eventually
> the session expired event.
>
> ben
>
>
> On 10/12/2010 10:57 AM, Avinash Lakshman wrote:
>
>> Would my watcher get invoked on this ConnectionLoss event? If so I am
>> thinking I will check for KeeperState.Disconnected and reset my state. Is
>> my
>> understanding correct? Please advice.
>>
>> Thanks
>> Avinash
>>
>> On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed<br...@yahoo-inc.com>
>>  wrote:
>>
>>   ZooKeeper considers a client dead when it hasn't heard from that client
>>> during the timeout period. clients make sure to communicate with
>>> ZooKeeper
>>> at least once in 1/3 the timeout period. if the client doesn't hear from
>>> ZooKeeper in 2/3 the timeout period, the client will issue a
>>> ConnectionLoss
>>> event and cause outstanding requests to fail with a ConnectionLoss.
>>>
>>> So, if ZooKeeper decides a process is dead, the process will get a
>>> ConnectionLoss event. Once ZooKeeper decides that a client is dead, if
>>> the
>>> client reconnects, the client will get a SessionExpired. Once a session
>>> is
>>> expired, the expired handle will become useless, so no new requests, no
>>> watches, etc.
>>>
>>> The bottom line is if your process gets a process expired, you need to
>>> treat that process as expired and recover by creating a new zookeeper
>>> handle
>>> (possibly by restarting the process) and resetup your state.
>>>
>>> ben
>>>
>>>
>>> On 10/12/2010 09:54 AM, Avinash Lakshman wrote:
>>>
>>>  This is what I have going:
>>>>
>>>> I have a bunch of 200 nodes come up and create an ephemeral entry under
>>>> a
>>>> znode names /Membership. When nodes are detected dead the node
>>>> associated
>>>> with the dead node under /Membership is deleted and watch delivered to
>>>> the
>>>> rest of the members. Now there are circumstances a node A is deemed dead
>>>> while the process is still up and running on A. It is a false detection
>>>> which I need to probably deal with. How do I deal with this situation?
>>>>  Over
>>>> time false detections delete all the entries underneath the /Membership
>>>> znode even though all processes are up and running.
>>>>
>>>> So my questions are:
>>>> Would the watches be pushed out to the node that is falsely deemed dead?
>>>> If
>>>> so I can have that process recreate the ephemeral znode underneath
>>>> /Membership.
>>>> If a node leaves a watch and then truly crashes. When it comes back up
>>>> would
>>>> it get watches it missed during the interim period? In any case how do
>>>> watches behave in the event of false/true failure detection?
>>>>
>>>> Thanks
>>>> A
>>>>
>>>>
>>>
>

Reply via email to