Ravi Kishore Valeti created HBASE-28834: -------------------------------------------
Summary: Procedure queues & PE pool metrics Key: HBASE-28834 URL: https://issues.apache.org/jira/browse/HBASE-28834 Project: HBase Issue Type: Improvement Reporter: Ravi Kishore Valeti While investigating a production incident, we observed that some procedures are getting created but never getting executed until a HMaster failover. - master-2 was active & rs-1 holding meta - 18:40, bunch of RSs (~80) reported crashed & SCPs were created & being executed - 19:51, balancer decided to move Meta region to another RS -> TRSP created -> Meta region went offline - 19:52, RS carrying meta crashed -> SCP created - 19:52 - Both TRSP & SCP seemed stuck/not executing - No more logs related to these procedures - 21:09 - Master failed over from master-2 to master-3 - Procs were loaded from store & attached. - 21:17 - When the TRSP for meta had completed, meta came back online. I will post the logs in some time. -- This message was sent by Atlassian Jira (v8.20.10#820010)