fapifta opened a new pull request, #5122: URL: https://github.com/apache/ozone/pull/5122
## What changes were proposed in this pull request? A detailed description of the problem can be found in the JIRA ticket. The TLDR: the deadlock happens because SCM startup calls the listCA method on the SCM's cert client, which is synchronized, but it connects to the leader SCM to get data, and waits until the leader comes out of safe mode because of how the SCM Security protocol server is implemented. Also it waits infinitely. In the meantime, as the SCM's security protocol server is already started, two other service filed a request already and the server uses the SCM's cert client to get the data requested by the other services, but the request processing can not be finished, as it can not get into the also synchronized method within the certificate client. The proposed solution is to separate the two locks. ListCA, and the related methods are used only from SCM code, and from container operation clients for recovery. As recovery is not initiated during safe mode, we can safely say that having a separate lock for the listCA method and all other methods that access the pemEncodedCACerts solves the problem, as with that we unblock other operations for the clients while the server side is working on to have the certificates properly persisted. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-9061 ## How was this patch tested? I have no idea how we can write a stable reproduction case, as these threads are scheduled as they are being scheduled by the JVM, and we would need to run all things in a particular order with two other requests being in a specific stage of processing each at the right time and I think this one can not be achieved by any code hackery without adding more significant modifications to the production code in question, which I believe does not worth it. However if there is a cheap and easy way I am open to learn about the solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
