Ravi Kishore Valeti created HBASE-28834:
-------------------------------------------

             Summary: Procedure queues & PE pool metrics
                 Key: HBASE-28834
                 URL: https://issues.apache.org/jira/browse/HBASE-28834
             Project: HBase
          Issue Type: Improvement
            Reporter: Ravi Kishore Valeti


While investigating a production incident, we observed that some procedures are 
getting created but never getting executed until a HMaster failover.

- master-2 was active & rs-1 holding meta
- 18:40, bunch of RSs (~80) reported crashed & SCPs were created & being 
executed
- 19:51, balancer decided to move Meta region to another RS -> TRSP created -> 
Meta region went offline
- 19:52, RS carrying meta crashed -> SCP created
- 19:52 - Both TRSP & SCP seemed stuck/not executing  - No more logs related to 
these procedures
- 21:09 - Master failed over from master-2 to master-3
- Procs were loaded from store & attached.
- 21:17 -  When the TRSP for meta had completed, meta came back online.


I will post the logs in some time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to