AlexanderKM opened a new pull request, #17760:
URL: https://github.com/apache/pinot/pull/17760

   ## Context
   
   We have observed that MSE queries start failing after some time (30 days+ 
when our internal certs expire)
   
   snippet of example error:
   ```
   failed: Connection refused
   
io.grpc.netty.shaded.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166)
   
io.grpc.netty.shaded.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
   
io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:359)
   
io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:715)
   finishConnect(..) failed: Connection refused
   
io.grpc.netty.shaded.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166)
   
io.grpc.netty.shaded.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
   
io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:359)
   
io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:715)
   ```
   
   After some digging, it looks like there is a mismatch in the way Pinot is 
setting the SSL Provider for GRPC connections (i.e. connections for MSE 
queries).
   
   ## The Fix
   
   We can take inspiration from the 
[BaseGrpcQueryClient](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/grpc/BaseGrpcQueryClient.java#L108-L114)
   
   ```java
       if (tlsConfig.getSslProvider() != null) {
         sslContextBuilder =
             GrpcSslContexts.configure(sslContextBuilder, 
SslProvider.valueOf(tlsConfig.getSslProvider()));
       } else {
         sslContextBuilder = GrpcSslContexts.configure(sslContextBuilder);
       }
   ```
   
   Note, that when the sslProvider is not null, we want to explicitly call 
`GrpcSslContexts.configure(sslContextBuilder, 
SslProvider.valueOf(tlsConfig.getSslProvider()));`.
   
   When the call `GrpcSslContexts.configure(sslContextBuilder);` is made 
without the explicit sslProvider, 
   this silently uses the default ssl provider, OpenSSL, with this code:
   ```java
       @CanIgnoreReturnValue
       public static SslContextBuilder configure(SslContextBuilder builder) {
           return configure(builder, defaultSslProvider());
       }
   ```
   [[source 
here](https://github.com/grpc/grpc-java/blob/master/netty/src/main/java/io/grpc/netty/GrpcSslContexts.java#L146-L148)]
 
   
   So even when we were building and setting the ssl provider, it is 
essentially skipped later.
   
   Thus, the fix here is to mimic the grpc client code, and always call 
`GrpcSslContexts.configure(sslContextBuilder, sslProvider)`.
   
   The big root cause is that the default OpenSSL provider only loads the raw 
key bytes and cert bytes ONCE into native memory, and does not reload them 
later. The JDK provider stores references to the KeyManager and TrustManager 
objects, which can be refreshed in the background to support the reload use 
case from 
[RenewableTlsUtils](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/tls/RenewableTlsUtils.java)
   
   ## Testing
   
   Confirmed the MSE queries work with our deployments 👍 
   
   
   Tagging @xiangfu0 since I've seen some recent PRs related to this type of 
code
   https://github.com/apache/pinot/pull/17358
   https://github.com/apache/pinot/pull/17559


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to