fapifta opened a new pull request, #5649: URL: https://github.com/apache/ozone/pull/5649
## What changes were proposed in this pull request? In case of long running clients, there might be a case, where the rootCA certificate of Ozone's internal PKI system is rotated, and service components renew their certificates with the new CA. In this case if the RPCClient code tries to connect to the same DataNode after the xceiver is expired in the XceiverClientManager cache, then a new connection to the DataNode fails to connect to that DN in the future, as the client does not have a mechanism to check if there was a rootCA certificate rotation. In this PR I would like to add a custom TrustManager to the client, that utilize the former behaviour, and uses the CA certificates that are provided to the RPC client from Ozone Manager, however with the change, the new TrustManager has an injected method to a remote RPC call via which it can re-fetch a ServiceInfoEx object once it runs into a certificate verification failure, and retry once based on the newly fetched list of CA certificates. Some small other changes were bundled with this change: - removed the connect(String encodedToken) method from low level RPC clients, we send tokens in an other way so we did not used them, due to this change a couple of test classes have also changed. - Removed SCMClientConfig from the XceiverClientManager class, it was not used. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-8958 ## How was this patch tested? In the TestOzoneContainerWithTLS test class, I have added a new test method, that checks the followings: - once I create a client it is able to connect to the rpc server of a DN that deals with clients. - if the rootCA certificate is expired, but the client is not evicted from the cache (held live via not releasing it), then the already established channel works without problems (unless an unexpected new SSL handshake happens, but that is highly unlikely in this short timeframe the test runs within). - the test verifies that after the CA certificate expired, and the client is evicted from the cache, a new client is created, and it tries to re-fetch the certificate from the remote provider. (Note that in the test the in-memory and remote provider is the same helper object, so first the operation is expected to fail). - after the certificates are renewed in the helper object, a new client again succeeds operating with the server side, as it refreshes the certificates and retries the operation properly. With this I think the test demonstrates that the system is working fine in the named conditions, and operates properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
