IIUC the major issue here is what will happen if the client decides session
expired but the server hold a valid session and reconnect.

4/3 time may best effort do the expiration after the server expires the
session, but we need to prove a happens-before relation or think of the
issues described above.

However, Curator has done this client-side expiration with a similar
algorithm for a long time and I didn't hear any issues reported. So such a
solution can be battle-tested.

Best,
tison.


Kezhu Wang <kez...@gmail.com> 于2023年9月1日周五 16:24写道:

> Hi all,
>
> ZooKeeper session will expire approximately after negotiated session
> timeout. Currently, client will learn this after successful contact to
> ZooKeeper cluster. This exposes an endless client side connection loss
> when client can't reach ZooKeeper cluster due to either incomplete
> connection string or whole cluster downtime.
>
> There is a `SessionTimeoutException` in `CliientCnxn`, but it never
> counts as session expiration.
>
> Possibly at least four jira issues reported the behavior described above.
>
> * ZOOKEEPER-2188[1]: client connection hung up because of dead loop
> * ZOOKEEPER-4412[2]: client blocked too long before session timeout
> * ZOOKEEPER-4508[3]: ZooKeeper client run to endless loop in
> ClientCnxn.SendThread.run if all server down
> * ZOOKEEPER-4692[4]: Handle SessionTimeoutException in Java client
>
> I propose to add an `expirationTimeout` in `ClientCnxn` to deal with
> this. The value could be approximately `4/3` of `connectTimeout` or
> `negotiatedSessionTimeout` depending on stage. I opened a pr[5] for
> evaluation.
>
> Any suggestions ? Thanks!
>
> [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-2188
> [2]: https://issues.apache.org/jira/browse/ZOOKEEPER-4412
> [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508
> [4]: https://issues.apache.org/jira/browse/ZOOKEEPER-4692
> [5]: https://github.com/apache/zookeeper/pull/2058
>
> Best,
> Kezhu Wang
>

Reply via email to