Re: Membership using ZK

2010-10-12 Thread Avinash Lakshman
Would my watcher get invoked on this ConnectionLoss event? If so I am
thinking I will check for KeeperState.Disconnected and reset my state. Is my
understanding correct? Please advice.

Thanks
Avinash

On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed br...@yahoo-inc.com wrote:

  ZooKeeper considers a client dead when it hasn't heard from that client
 during the timeout period. clients make sure to communicate with ZooKeeper
 at least once in 1/3 the timeout period. if the client doesn't hear from
 ZooKeeper in 2/3 the timeout period, the client will issue a ConnectionLoss
 event and cause outstanding requests to fail with a ConnectionLoss.

 So, if ZooKeeper decides a process is dead, the process will get a
 ConnectionLoss event. Once ZooKeeper decides that a client is dead, if the
 client reconnects, the client will get a SessionExpired. Once a session is
 expired, the expired handle will become useless, so no new requests, no
 watches, etc.

 The bottom line is if your process gets a process expired, you need to
 treat that process as expired and recover by creating a new zookeeper handle
 (possibly by restarting the process) and resetup your state.

 ben


 On 10/12/2010 09:54 AM, Avinash Lakshman wrote:

 This is what I have going:

 I have a bunch of 200 nodes come up and create an ephemeral entry under a
 znode names /Membership. When nodes are detected dead the node associated
 with the dead node under /Membership is deleted and watch delivered to the
 rest of the members. Now there are circumstances a node A is deemed dead
 while the process is still up and running on A. It is a false detection
 which I need to probably deal with. How do I deal with this situation?
  Over
 time false detections delete all the entries underneath the /Membership
 znode even though all processes are up and running.

 So my questions are:
 Would the watches be pushed out to the node that is falsely deemed dead?
 If
 so I can have that process recreate the ephemeral znode underneath
 /Membership.
 If a node leaves a watch and then truly crashes. When it comes back up
 would
 it get watches it missed during the interim period? In any case how do
 watches behave in the event of false/true failure detection?

 Thanks
 A





Re: Membership using ZK

2010-10-12 Thread Ted Dunning
Yes.  You should get that event.

You should also debug why you are getting disconnected in the first place.
 This is often a symptom of something really bad that is happening on your
client side such as very long GC's.  If these are unavoidable, then you need
to adjust the timeouts with ZK to reflect reality.  Another possibility is
that your network connections are dropping or that your application is
freezing for a non-GC reason.  Any of these problems are something you
should address.

Of course, the connection loss event should be handled correctly as well
since honest to god disconnects can happen.

On Tue, Oct 12, 2010 at 10:57 AM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:

 Would my watcher get invoked on this ConnectionLoss event? If so I am
 thinking I will check for KeeperState.Disconnected and reset my state. Is
 my
 understanding correct? Please advice.

 Thanks
 Avinash

 On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed br...@yahoo-inc.com
 wrote:

   ZooKeeper considers a client dead when it hasn't heard from that client
  during the timeout period. clients make sure to communicate with
 ZooKeeper
  at least once in 1/3 the timeout period. if the client doesn't hear from
  ZooKeeper in 2/3 the timeout period, the client will issue a
 ConnectionLoss
  event and cause outstanding requests to fail with a ConnectionLoss.
 
  So, if ZooKeeper decides a process is dead, the process will get a
  ConnectionLoss event. Once ZooKeeper decides that a client is dead, if
 the
  client reconnects, the client will get a SessionExpired. Once a session
 is
  expired, the expired handle will become useless, so no new requests, no
  watches, etc.
 
  The bottom line is if your process gets a process expired, you need to
  treat that process as expired and recover by creating a new zookeeper
 handle
  (possibly by restarting the process) and resetup your state.
 
  ben
 
 
  On 10/12/2010 09:54 AM, Avinash Lakshman wrote:
 
  This is what I have going:
 
  I have a bunch of 200 nodes come up and create an ephemeral entry under
 a
  znode names /Membership. When nodes are detected dead the node
 associated
  with the dead node under /Membership is deleted and watch delivered to
 the
  rest of the members. Now there are circumstances a node A is deemed dead
  while the process is still up and running on A. It is a false detection
  which I need to probably deal with. How do I deal with this situation?
   Over
  time false detections delete all the entries underneath the /Membership
  znode even though all processes are up and running.
 
  So my questions are:
  Would the watches be pushed out to the node that is falsely deemed dead?
  If
  so I can have that process recreate the ephemeral znode underneath
  /Membership.
  If a node leaves a watch and then truly crashes. When it comes back up
  would
  it get watches it missed during the interim period? In any case how do
  watches behave in the event of false/true failure detection?
 
  Thanks
  A