Github user Randgalt commented on the issue:
https://github.com/apache/curator/pull/262
> -If the connection between client and server is lost a disconnected event
is received essentially immediately.
> -If a heart beat is missed it takes 2/3 of the session timeout.
Thinking more about this... There are three scenarios:
1. The internal ZK client detects a missed heartbeat, closes the connection
and sends `Event.KeeperState.Disconnected` (note: this is done by
ClientCnxn.java in the SendThread class's run() method - the huge catch
handler).
2. The server detects a missed heartbeat and closes the connection which
causes the client to get a closed connection and most likely a
`SocketException` (which is handled in that same catch clause)
3. The connection fails for other reasons (TCP errors, server crashes,
etc.).
Unfortunately, cases 2 and 3 look _exactly the same_ to the client. In case
2, the client should ideally act as if 2/3 of the session have elapsed. In case
3, the client should act as if none of the session has elapsed. Worse still, if
the entire ZK cluster is down, it could conceivably come back up and clients
won't lose their sessions at all because "time" in ZK is based on the leader's
start time (I still think there's value in killing the session on the client
side anyway as clients could be left hanging interminably).
So, even if we can know the client side heartbeat miss (case 1), we have no
way of mitigating cases 2/3. I'm not sure what to do. Maybe leave
`StandardConnectionHandlingPolicy` as is and add a new, optional policy,
`AggressiveConnectionHandlingPolicy` that always assumes connection loss is due
to missed heartbeats? Or, possibly, just do nothing and leave things as they
are?
Thoughts?
Advertising
---