siddhantsangwan opened a new pull request, #3535:
URL: https://github.com/apache/ozone/pull/3535

   ## What changes were proposed in this pull request?
   
   ContainerBalancer has `balancingThread.join()` being called in 
ContainerBalancer#stopBalancingThread. Callers of this method acquire but don't 
release the only lock in this class when calling this method. If at this time 
another thread is trying to acquire the lock, we have a deadlock.
   
   For example, SCMClientProtocolServer#stopContainerBalancer() will lead to 
the calling thread wait for the balancing thread to join in 
ContainerBalancer#stopBalancingThread. If the balancing thread now checks for 
`isBalancerRunning()` in ContainerBalancer#balance, the two threads will get 
into a deadlock. The balancing thread is disabled and waiting to acquire the 
lock, while the other thread is waiting for balancing thread to finish.
   
   Changes: Release lock in callers of ContainerBalancer#stopBalancingThread 
before this method is called. Remove locking in `isBalancerRunning()`.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6928
   
   ## How was this patch tested?
   
   A basic UT that starts and then immediately stops balancer. In the existing 
code, this leads to a deadlock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to