FYI: http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting
On Tue, Oct 12, 2010 at 2:23 PM, Benjamin Reed <br...@yahoo-inc.com> wrote: > yes, your watcher objects will get the connectionloss event and eventually > the session expired event. > > ben > > > On 10/12/2010 10:57 AM, Avinash Lakshman wrote: > >> Would my watcher get invoked on this ConnectionLoss event? If so I am >> thinking I will check for KeeperState.Disconnected and reset my state. Is >> my >> understanding correct? Please advice. >> >> Thanks >> Avinash >> >> On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed<br...@yahoo-inc.com> >> wrote: >> >> ZooKeeper considers a client dead when it hasn't heard from that client >>> during the timeout period. clients make sure to communicate with >>> ZooKeeper >>> at least once in 1/3 the timeout period. if the client doesn't hear from >>> ZooKeeper in 2/3 the timeout period, the client will issue a >>> ConnectionLoss >>> event and cause outstanding requests to fail with a ConnectionLoss. >>> >>> So, if ZooKeeper decides a process is dead, the process will get a >>> ConnectionLoss event. Once ZooKeeper decides that a client is dead, if >>> the >>> client reconnects, the client will get a SessionExpired. Once a session >>> is >>> expired, the expired handle will become useless, so no new requests, no >>> watches, etc. >>> >>> The bottom line is if your process gets a process expired, you need to >>> treat that process as expired and recover by creating a new zookeeper >>> handle >>> (possibly by restarting the process) and resetup your state. >>> >>> ben >>> >>> >>> On 10/12/2010 09:54 AM, Avinash Lakshman wrote: >>> >>> This is what I have going: >>>> >>>> I have a bunch of 200 nodes come up and create an ephemeral entry under >>>> a >>>> znode names /Membership. When nodes are detected dead the node >>>> associated >>>> with the dead node under /Membership is deleted and watch delivered to >>>> the >>>> rest of the members. Now there are circumstances a node A is deemed dead >>>> while the process is still up and running on A. It is a false detection >>>> which I need to probably deal with. How do I deal with this situation? >>>> Over >>>> time false detections delete all the entries underneath the /Membership >>>> znode even though all processes are up and running. >>>> >>>> So my questions are: >>>> Would the watches be pushed out to the node that is falsely deemed dead? >>>> If >>>> so I can have that process recreate the ephemeral znode underneath >>>> /Membership. >>>> If a node leaves a watch and then truly crashes. When it comes back up >>>> would >>>> it get watches it missed during the interim period? In any case how do >>>> watches behave in the event of false/true failure detection? >>>> >>>> Thanks >>>> A >>>> >>>> >>> >