RaulGracia opened a new pull request #2761: URL: https://github.com/apache/bookkeeper/pull/2761
### Motivation Added `TCP_USER_TIMEOUT` in Epoll channel config to limit the time a connection is left sending keepalives to a non-responding Bookie. ### Changes The original issue reported that in scenarios where Bookies may go down unexpectedly and change their IP (e.g., Kubernetes), the Bookkeeper client may be left for some time attempting to connect with the old IP of the restarted Bookie (see #2482 for details). To prevent this problem from happening (in Epoll channels), we introduce the following changes: - Epoll channels are now configured with `TCP_USER_TIMEOUT`. This parameter rules over the underlying TCP keepalive configuration (see https://datatracker.ietf.org/doc/html/rfc5482), which may be defaulted to retry for too long depending on the environment (e.g., 10-15 minutes in our experience). - To prevent adding more configuration parameters, the existing `clientConnectTimeoutMillis` value in `ClientConfiguration` is the one used to set `TCP_USER_TIMEOUT` due to its similarity. ### Validation We have reproduced the original testing environment in which this problem appears consistently: - Cluster with 4 Bookies and 3 Kubernetes nodes, in addition to https://pravega.io which uses the Bookkeeper client. - Deployed an application to do IO to Pravega (and therefore, to Bookkeeper). - Periodically shut down a Kubernetes node, so Bookkeeper pods on it are restarted as well. Considering this test procedure, without the proposed PR we consistently observe Bookkeeper clients getting stuck trying to contact with old IPs from Bookies. With this change, we confirmed via logs that the configuration change takes place and we have not been able to reproduce the original problem so far after performing multiple node reboots. Master Issue: #2482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
