IIUC the major issue here is what will happen if the client decides session expired but the server hold a valid session and reconnect.
4/3 time may best effort do the expiration after the server expires the session, but we need to prove a happens-before relation or think of the issues described above. However, Curator has done this client-side expiration with a similar algorithm for a long time and I didn't hear any issues reported. So such a solution can be battle-tested. Best, tison. Kezhu Wang <kez...@gmail.com> 于2023年9月1日周五 16:24写道: > Hi all, > > ZooKeeper session will expire approximately after negotiated session > timeout. Currently, client will learn this after successful contact to > ZooKeeper cluster. This exposes an endless client side connection loss > when client can't reach ZooKeeper cluster due to either incomplete > connection string or whole cluster downtime. > > There is a `SessionTimeoutException` in `CliientCnxn`, but it never > counts as session expiration. > > Possibly at least four jira issues reported the behavior described above. > > * ZOOKEEPER-2188[1]: client connection hung up because of dead loop > * ZOOKEEPER-4412[2]: client blocked too long before session timeout > * ZOOKEEPER-4508[3]: ZooKeeper client run to endless loop in > ClientCnxn.SendThread.run if all server down > * ZOOKEEPER-4692[4]: Handle SessionTimeoutException in Java client > > I propose to add an `expirationTimeout` in `ClientCnxn` to deal with > this. The value could be approximately `4/3` of `connectTimeout` or > `negotiatedSessionTimeout` depending on stage. I opened a pr[5] for > evaluation. > > Any suggestions ? Thanks! > > [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-2188 > [2]: https://issues.apache.org/jira/browse/ZOOKEEPER-4412 > [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508 > [4]: https://issues.apache.org/jira/browse/ZOOKEEPER-4692 > [5]: https://github.com/apache/zookeeper/pull/2058 > > Best, > Kezhu Wang >