AlexanderKM opened a new pull request, #17760: URL: https://github.com/apache/pinot/pull/17760
## Context We have observed that MSE queries start failing after some time (30 days+ when our internal certs expire) snippet of example error: ``` failed: Connection refused io.grpc.netty.shaded.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166) io.grpc.netty.shaded.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131) io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:359) io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:715) finishConnect(..) failed: Connection refused io.grpc.netty.shaded.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166) io.grpc.netty.shaded.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131) io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:359) io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:715) ``` After some digging, it looks like there is a mismatch in the way Pinot is setting the SSL Provider for GRPC connections (i.e. connections for MSE queries). ## The Fix We can take inspiration from the [BaseGrpcQueryClient](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/grpc/BaseGrpcQueryClient.java#L108-L114) ```java if (tlsConfig.getSslProvider() != null) { sslContextBuilder = GrpcSslContexts.configure(sslContextBuilder, SslProvider.valueOf(tlsConfig.getSslProvider())); } else { sslContextBuilder = GrpcSslContexts.configure(sslContextBuilder); } ``` Note, that when the sslProvider is not null, we want to explicitly call `GrpcSslContexts.configure(sslContextBuilder, SslProvider.valueOf(tlsConfig.getSslProvider()));`. When the call `GrpcSslContexts.configure(sslContextBuilder);` is made without the explicit sslProvider, this silently uses the default ssl provider, OpenSSL, with this code: ```java @CanIgnoreReturnValue public static SslContextBuilder configure(SslContextBuilder builder) { return configure(builder, defaultSslProvider()); } ``` [[source here](https://github.com/grpc/grpc-java/blob/master/netty/src/main/java/io/grpc/netty/GrpcSslContexts.java#L146-L148)] So even when we were building and setting the ssl provider, it is essentially skipped later. Thus, the fix here is to mimic the grpc client code, and always call `GrpcSslContexts.configure(sslContextBuilder, sslProvider)`. The big root cause is that the default OpenSSL provider only loads the raw key bytes and cert bytes ONCE into native memory, and does not reload them later. The JDK provider stores references to the KeyManager and TrustManager objects, which can be refreshed in the background to support the reload use case from [RenewableTlsUtils](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/tls/RenewableTlsUtils.java) ## Testing Confirmed the MSE queries work with our deployments 👍 Tagging @xiangfu0 since I've seen some recent PRs related to this type of code https://github.com/apache/pinot/pull/17358 https://github.com/apache/pinot/pull/17559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
