siddhantsangwan opened a new pull request, #3535: URL: https://github.com/apache/ozone/pull/3535
## What changes were proposed in this pull request? ContainerBalancer has `balancingThread.join()` being called in ContainerBalancer#stopBalancingThread. Callers of this method acquire but don't release the only lock in this class when calling this method. If at this time another thread is trying to acquire the lock, we have a deadlock. For example, SCMClientProtocolServer#stopContainerBalancer() will lead to the calling thread wait for the balancing thread to join in ContainerBalancer#stopBalancingThread. If the balancing thread now checks for `isBalancerRunning()` in ContainerBalancer#balance, the two threads will get into a deadlock. The balancing thread is disabled and waiting to acquire the lock, while the other thread is waiting for balancing thread to finish. Changes: Release lock in callers of ContainerBalancer#stopBalancingThread before this method is called. Remove locking in `isBalancerRunning()`. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-6928 ## How was this patch tested? A basic UT that starts and then immediately stops balancer. In the existing code, this leads to a deadlock. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
