capistrant opened a new issue, #18629: URL: https://github.com/apache/druid/issues/18629
## Fabric8 KubernetesClient Overview [Fabric8 KubernetesClient](https://github.com/fabric8io/kubernetes-client) is the client library we use in the [druid-kubernetes-overlord-extensions](https://druid.apache.org/docs/latest/development/extensions-core/k8s-jobs). ### Underlying HTTP Client Fabric8 uses an underlying HTTP client for client/server interaction with the K8s cluster. This HTTP client is pluggable. Fabric8 supports four different clients as of this writing: `['vert.x', 'okhttp', 'jetty', 'native-jdk']`. `vert.x` is currently the default client used by Fabric8. ## Druid History with the Fabric8 client #17913 switched Druid to use `vert.x`. #18013 got Druid caught up the latest Fabric8 versions. ## Druid's Path Forward The reason for this issue is that there have been issues with both `okhttp` and `vert.x` in production Druid clusters. In the wild, Druid operators have reported issues with both the vert.x and okhttp clients. * vert.x: Issues with failures communicating with the API server due to unhealthy connections in the connection pool, leading to sporadic task failures. * okhttp: Issues with large amounts of threads being created and polluting memory if there are many tasks being launched. The Druid developer community wants to reach a state where a stable default HTTP client and configuration is identified, simplifying configuration and distribution packaging. In the interim, Druid operators can select the HTTP client and configure some its parameters. This will help operators tailor the HTTP client to their use case and provide feedback to the Druid developer community on what works well in practice. ### Known Issues #### [vert.x](https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-vertx) * Issues with K8s API requests failing with `ConnectionClosed` exceptions due to unhealthy connections in the underlying connection pool. * This can lead to sporadic task failures. * The issue appears to be due to connections being closed on the server side, but the client side not cleaning them up before trying to use them in future requests. * [Related vert.x issue](https://github.com/fabric8io/kubernetes-client/issues/7252) has been opened with fabric8 to investigate exposing more configuration knobs to tune the connection pool. #### [okhttp](https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-okhttp) * With the default configuration, the client creates a large number of threads, which can lead to memory issues if there are many tasks being launched. * The underlying issue appears to be related to an unbounded thread pool being used by the client. We are exposing experimental configuration knobs to tune the thread pool size to attempt to mitigate this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
