Hi all,

ZooKeeper session will expire approximately after negotiated session
timeout. Currently, client will learn this after successful contact to
ZooKeeper cluster. This exposes an endless client side connection loss
when client can't reach ZooKeeper cluster due to either incomplete
connection string or whole cluster downtime.

There is a `SessionTimeoutException` in `CliientCnxn`, but it never
counts as session expiration.

Possibly at least four jira issues reported the behavior described above.

* ZOOKEEPER-2188[1]: client connection hung up because of dead loop
* ZOOKEEPER-4412[2]: client blocked too long before session timeout
* ZOOKEEPER-4508[3]: ZooKeeper client run to endless loop in
ClientCnxn.SendThread.run if all server down
* ZOOKEEPER-4692[4]: Handle SessionTimeoutException in Java client

I propose to add an `expirationTimeout` in `ClientCnxn` to deal with
this. The value could be approximately `4/3` of `connectTimeout` or
`negotiatedSessionTimeout` depending on stage. I opened a pr[5] for
evaluation.

Any suggestions ? Thanks!

[1]: https://issues.apache.org/jira/browse/ZOOKEEPER-2188
[2]: https://issues.apache.org/jira/browse/ZOOKEEPER-4412
[3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508
[4]: https://issues.apache.org/jira/browse/ZOOKEEPER-4692
[5]: https://github.com/apache/zookeeper/pull/2058

Best,
Kezhu Wang

Reply via email to