Chuong Tran created ZOOKEEPER-4921: -------------------------------------- Summary: Zookeeper Client 3.9.3 Fails to Reconnect After Network Failures Key: ZOOKEEPER-4921 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4921 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.9.3 Reporter: Chuong Tran
After upgrading the Java Zookeeper client to version 3.9.3, we observed that it is not resilient to brief network disruptions, such as a short VPN blip. In such cases, the client attempts to reconnect only once, and if unsuccessful, the session expires. {quote}Apr 23, 2025 10:19:23 AM com.twitter.finagle.common.zookeeper.ZooKeeperClient$3 process INFO: Zookeeper session expired. Event: WatchedEvent state:Expired type:None path:null zxid: -1 {quote} In contrast, the previous version (3.9.2) would continuously retry until the network connection was restored, maintaining the session more reliably. I believe it's a new issue with this change: https://issues.apache.org/jira/browse/ZOOKEEPER-4508 Step to repro: # Open VPN. # Start the application which connects to the Zookeeper server with the VPN. # Disable VPN for a couple of minutes. # Observe the application. # Enable the VPN again. {quote}3.9.3: "message" : "Session 0x0 for server XXX, Closing socket connection. Attempting reconnect except it is a SessionExpiredException or SessionTimeoutException.", "stackTrace" : "o.a.z.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 5590ms for session id 0x0 at o.a.z.ClientCnxn$SendThread.run(ClientCnxn.java:1253) {quote} 3.9.2: Application will be connected successfully. -- This message was sent by Atlassian Jira (v8.20.10#820010)