sadanand48 commented on PR #6781: URL: https://github.com/apache/ozone/pull/6781#issuecomment-2318686050
> I came to the realization that we are talking about a scenario where all three SCM lost their certificates, and therefore the Ratis server could not come up, which also means that their SCMSecurityProtocolServers has not started either... Right @fapifta , We cannot go through the usual way of fetching the SCM cert info via Ratis here which is why I implemented a hacky way to do it in my last updated patch. In an ideal flow the SCMSecurityClient is initialised in the SCM startup code flow which is run by the SCM node itself. Each SCM has its rocksdb and if we are guaranteed that the certs are persisted to each rocksdb, I just get the path of the scm.db from the scm configs and open the scm rocksdb instance in read mode and read the cert and write it to local file. I guess this still has problems because 1. we are assuming all SCM's have their RocksDB updated and on the same page which might not be true as we only need majority commit during usual write. 2. Also while implementing this I faced many issues with respect to dependencies/bouncy castle violations etc The problem here as mentioned by @nandakumar131 is very peculiar and is restricted to the user accidentally deleting their SCM certs but not private keys which is quite rare. To handle such rare cases, the tool might be helpful instead of trying to automate the logic. Thanks for the inputs, I will revert the last commit to only add the tool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
