[ 
https://issues.apache.org/jira/browse/HDDS-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556328#comment-17556328
 ] 

Siddhant Sangwan commented on HDDS-6928:
----------------------------------------

This is a deadlock scenario. The culprit is balancingThread.join() being called 
inside ContainerBalancer#stopBalancingThread() while this method's callers 
SCMService#stop() and ContainerBalancer#stopBalancer() are holding the lock.

{code}
  private void stopBalancingThread() {
    Thread balancingThread;
    lock.lock();
    try {
      balancingThread = currentBalancingThread;
      currentBalancingThread = null;
    } finally {
      lock.unlock();
    }
    // wait for balancingThread to die
    if (balancingThread != null &&
        balancingThread.getId() != Thread.currentThread().getId()) {
      balancingThread.interrupt();
      try {
        balancingThread.join();
      } catch (InterruptedException exception) {
        Thread.currentThread().interrupt();
      }
    }
    LOG.info("Container Balancer stopped successfully.");
  }
{code} 

> ozone container balancer CLI went in hung state due to deadlock
> ---------------------------------------------------------------
>
>                 Key: HDDS-6928
>                 URL: https://issues.apache.org/jira/browse/HDDS-6928
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Nilotpal Nandi
>            Assignee: Siddhant Sangwan
>            Priority: Major
>
> steps taken :
> -------------
> 1. Run container balancer using CLI, balancer went in running state.
> 2. Run SCM failover.
> 3. Run container balancer again using CLI
> Container balancer CLI (stop/status) went in hung state.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to