Kezhu, thanks for the great investigation!

I am also +1 for the second approach. This seems like the right balance of
resilience and compatibility with established applications.

Chris Nauroth


On Thu, Apr 24, 2025 at 8:07 AM Kezhu Wang <kez...@apache.org> wrote:

> Hi all,
>
> Before ZOOKEEPER-4508[1], clients could run into endless session
> reestablishment if zk cluster is unavailable.
>
> 1. For established sessions, the client tries to reestablish the
> session endlessly. Thus, the client will not get `Expired` until the
> cluster comes back.
> 2. In the case of no established session, the client tries to
> establish a brand-new session endlessly.
>
> I authored pr-2058[2] to fix them both by issuing `Expired` after
> approximately 4/3 * sessionTimeout as I think:
> 1. The first is a bug from clients' perspective. `Expired` deserves
> prompt awareness.
> 2. Clients should be aware of failure after a period of time, so as
> not to block clients indefinitely.
>
> In investigating ZOOKEEPER-4921[3], I found that Twitter Finagle[4]
> probably depends on the endless retry to establish a brand-new
> session.
>
> I think we have roads to go:
> 1. Treat endless retry in case of unestablished session as a bug. This
> will require third-party applications to fix themselves once they
> depend on the endless retry. I could author a fix for Finagle. But I
> doubt whether this endless retry is an already established feature for
> some third libraries.
> 2. Rollback to endless retry in 3.9.4. And provide customized
> `newSessionTimeout` to control timeout to establish a brand-new
> session through `ZooKeeperBuilder`[5] in 3.10. Together with
> ZOOKEEPER-4508, we will get `Expired` for established sessions and
> gain timeout control in establishing brand-new sessions without
> breaking possible third party applications.
>
> I am kindly leaning towards the second approach.
>
> What do you think ?
>
> Best,
> Kezhu
>
> [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-4508
> [2]: https://github.com/apache/zookeeper/pull/2058
> [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4921
> [4]:
> https://github.com/twitter/finagle/blob/develop/finagle-serversets/src/main/java/com/twitter/finagle/common/zookeeper/ZooKeeperClient.java#L357
> [5]: https://issues.apache.org/jira/browse/ZOOKEEPER-4697
>

Reply via email to