fapifta opened a new pull request, #5649:
URL: https://github.com/apache/ozone/pull/5649

   ## What changes were proposed in this pull request?
   In case of long running clients, there might be a case, where the rootCA 
certificate of Ozone's internal PKI system is rotated, and service components 
renew their certificates with the new CA.
   In this case if the RPCClient code tries to connect to the same DataNode 
after the xceiver is expired in the XceiverClientManager cache, then a new 
connection to the DataNode fails to connect to that DN in the future, as the 
client does not have a mechanism to check if there was a rootCA certificate 
rotation.
   
   In this PR I would like to add a custom TrustManager to the client, that 
utilize the former behaviour, and uses the CA certificates that are provided to 
the RPC client from Ozone Manager, however with the change, the new 
TrustManager has an injected method to a remote RPC call via which it can 
re-fetch a ServiceInfoEx object once it runs into a certificate verification 
failure, and retry once based on the newly fetched list of CA certificates.
   
   Some small other changes were bundled with this change:
   - removed the connect(String encodedToken) method from low level RPC 
clients, we send tokens in an other way so we did not used them, due to this 
change a couple of test classes have also changed.
   - Removed SCMClientConfig from the XceiverClientManager class, it was not 
used.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-8958
   
   ## How was this patch tested?
   In the TestOzoneContainerWithTLS test class, I have added a new test method, 
that checks the followings:
   - once I create a client it is able to connect to the rpc server of a DN 
that deals with clients.
   - if the rootCA certificate is expired, but the client is not evicted from 
the cache (held live via not releasing it), then the already established 
channel works without problems (unless an unexpected new SSL handshake happens, 
but that is highly unlikely in this short timeframe the test runs within).
   - the test verifies that after the CA certificate expired, and the client is 
evicted from the cache, a new client is created, and it tries to re-fetch the 
certificate from the remote provider. (Note that in the test the in-memory and 
remote provider is the same helper object, so first the operation is expected 
to fail).
   - after the certificates are renewed in the helper object, a new client 
again succeeds operating with the server side, as it refreshes the certificates 
and retries the operation properly.
   
   With this I think the test demonstrates that the system is working fine in 
the named conditions, and operates properly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to