Hi folks, I recently uncovered a problem in the ModifyTableProcedure that I want to discuss with the community. I altered a table to increase the region replica count from 1 to 2. This led to some of the secondary replica regions being stuck opening, which was difficult to remediate, although luckily did not impact database users. I believe that the reason is because of an unfortunate interaction between the ModifyTableProcedure and the SplitTableRegionProcedure.
When adding new replicas, ModifyTableProcedure roughly does the following: 1. Edits the table descriptor 2. Closes and opens all existing regions 3. Creates replication peer to support the new replica regions 4. Opens new replica regions When closing a region in step 2, the act of closing it can trigger a split. When the HMaster runs the SplitTableRegionProcedure, it roughly does: 1. Close the region to be split 2. Split the store files 3. Open the new daughter regions (including replicas) When a split is done as part of a ModifyTableProcedure that adds replicas, we get the following combination of behavior: 1. Edit the table descriptor 2. Close and open all existing regions, adding a SplitTableRegionProcedure to the HMaster's queue 3. To begin the SplitTableRegionProcedure, close region to be split 4. Split store files 5. Open the new daughter regions, including replicas, completing the SplitTableRegionProcedure 6. Create replication peer to support the new replica regions 7. Open new replica regions, even though some were already opened in step 5. Double-opening doesn't work, and the open procedures are stuck. For some proof, I've prepared selected log lines from the incident at my company that prompted this email, which are at the end of this email. In thinking about this problem, I looked at the code that triggers splits. This is in the MemStoreFlusher class, which is invoked on every region close. It sometimes does a split if HRegion#checkSplit() returns non-empty. That method will return empty if the region's "closing" flag is set, however the order of operations of a region close means that the closing flag is not yet set when checkSplit() checks it. This seems like an oversight to me, and perhaps the solution is as simple as moving the `closing.set(true)` call to somewhere earlier in HRegion#doClose(). However, I don't understand what the full implications of this would be. I also thought about this from another angle: there is a span of time when the ModifyTableProcedure has modified the table descriptor, but has not yet opened the new replicas, and during that time, it assumes that no other procedures will act upon the new information in the table descriptor. In my case, the SplitTableRegionProcedure violated this assumption. Perhaps this assumption should be made enforceable, like somehow no region operations should be permitted by non-children of the ModifyTableProcedure until it has completed. What do you all think? Thank you for your time, Charles Logs from the incident: // Start of ModifyTableProcedure (hmaster) 2025-05-06T17:37:07,884 [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=60000] INFO org.apache.hadoop.hbase.master.HMaster: Client=HBase2TasksJobs-setReadReplication//172.16.81.147 modify table email-objects-1 from 'email-objects-1', {...} to 'email-objects-1', {... REGION_REPLICATION => '2',...} // Re-open b85d893815b488f1282159d89e1cc86b to pick up new table descriptor (hmaster) 2025-05-06T17:37:35,717 [PEWorker-2] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[...{pid=22035700, ppid=22034599, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=email-objects-1, region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE}...] // Close b85d893815b488f1282159d89e1cc86b (hmaster) 2025-05-06T17:37:35,816 [PEWorker-5] INFO org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: Took xlock for pid=22035700, ppid=22034599, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=email-objects-1, region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE (hmaster) 2025-05-06T17:37:35,817 [PEWorker-5] INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, regionState=CLOSING, regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 (hmaster) 2025-05-06T17:37:35,827 [PEWorker-5] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=22035776, ppid=22035700, state=RUNNABLE; CloseRegionProcedure b85d893815b488f1282159d89e1cc86b, server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}] (regionserver) 2025-05-06 17:37:35.986 UTC Close b85d893815b488f1282159d89e1cc86b (regionserver) 2025-05-06 17:37:35.986 UTC Running close preflush of b85d893815b488f1282159d89e1cc86b // At this point I suspect the RegionServer sent a split request to the HMaster (regionserver) 2025-05-06 17:37:35.987 UTC Flushing b85d893815b488f1282159d89e1cc86b 2/2 column families, dataSize=8.25 MB heapSize=10.12 MB (regionserver) 2025-05-06 17:37:36.057 UTC Flushed memstore data size=7.80 MB at sequenceid=26941273 (bloomFilter=true), to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/e/7ffadb785b8e4ed88567437cc7633e09 (regionserver) 2025-05-06 17:37:36.117 UTC Flushed memstore data size=455.16 KB at sequenceid=26941273 (bloomFilter=true), to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/i/ec9f55f487a443d99636f4aa6dd297d7 (hmaster) 2025-05-06T17:37:36,174 [PEWorker-9] INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, regionState=CLOSED (hmaster) 2025-05-06T17:37:36,182 [PEWorker-8] INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=22035700, ppid=22034599, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=email-objects-1, region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE; state=CLOSED, location=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104; forceNewPlan=false, retain=true (hmaster) 2025-05-06T17:37:36,182 [PEWorker-9] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22035776, ppid=22035700, state=SUCCESS; CloseRegionProcedure b85d893815b488f1282159d89e1cc86b, server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 in 357 msec // Open b85d893815b488f1282159d89e1cc86b with new table descriptor (hmaster) 2025-05-06T17:37:36,239 [PEWorker-6] INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, regionState=OPENING, regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 (hmaster) 2025-05-06T17:37:36,245 [PEWorker-6] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=22035800, ppid=22035700, state=RUNNABLE; OpenRegionProcedure b85d893815b488f1282159d89e1cc86b, server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}] (hmaster) 2025-05-06T17:37:36,529 [PEWorker-10] INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, regionState=OPEN, openSeqNum=26941277, regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 (hmaster) 2025-05-06T17:37:36,539 [PEWorker-10] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22035800, ppid=22035700, state=SUCCESS; OpenRegionProcedure b85d893815b488f1282159d89e1cc86b, server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 in 290 msec (hmaster) 2025-05-06T17:37:36,557 [PEWorker-7] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22035700, ppid=22034599, state=SUCCESS; TransitRegionStateProcedure table=email-objects-1, region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE in 822 msec // The requested split of b85d893815b488f1282159d89e1cc86b begins on the HMaster, beginning with closing the region. It works fine because b85d893815b488f1282159d89e1cc86b is open. (hmaster) 2025-05-06T17:37:43,225 [PEWorker-1] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=22036140, ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=email-objects-1, region=b85d893815b488f1282159d89e1cc86b, UNASSIGN}, {pid=22036141, ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=email-objects-1, region=b47e159aa157e066841e67ca1490a55d, UNASSIGN}] (hmaster) 2025-05-06T17:37:43,231 [PEWorker-6] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=22036142, ppid=22036140, state=RUNNABLE; CloseRegionProcedure b85d893815b488f1282159d89e1cc86b, server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}] (hmaster) 2025-05-06T17:37:45,352 [PEWorker-1] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036142, ppid=22036140, state=SUCCESS; CloseRegionProcedure b85d893815b488f1282159d89e1cc86b, server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 in 2.1170 sec (hmaster) 2025-05-06T17:37:45,355 [PEWorker-4] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036140, ppid=22036139, state=SUCCESS; TransitRegionStateProcedure table=email-objects-1, region=b85d893815b488f1282159d89e1cc86b, UNASSIGN in 2.1270 sec (hmaster) 2025-05-06T17:37:45,389 [PEWorker-3] INFO org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure: pid=22036139 splitting 7 storefiles, region=b85d893815b488f1282159d89e1cc86b, threads=7 // The region is now split. // Open split daughters 942ae945978493ac3304a6eeb286a110 and 2e97f125b567c33626891076fdb4f500, which is okay // and also open daughter replicas f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65, which is not okay (hmaster) 2025-05-06T17:37:46,515 [PEWorker-3] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=22036145, ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=email-objects-1, region=942ae945978493ac3304a6eeb286a110, ASSIGN}, {pid=22036146, ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=email-objects-1, region=2e97f125b567c33626891076fdb4f500, ASSIGN}, {pid=22036147, ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22036148, ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65, ASSIGN}] ... (hmaster) 2025-05-06T17:37:46,960 [PEWorker-5] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036145, ppid=22036139, state=SUCCESS; TransitRegionStateProcedure table=email-objects-1, region=942ae945978493ac3304a6eeb286a110, ASSIGN in 443 msec (hmaster) 2025-05-06T17:37:46,961 [PEWorker-8] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036146, ppid=22036139, state=SUCCESS; TransitRegionStateProcedure table=email-objects-1, region=2e97f125b567c33626891076fdb4f500, ASSIGN in 444 msec (hmaster) 2025-05-06T17:37:47,027 [PEWorker-12] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036147, ppid=22036139, state=SUCCESS; TransitRegionStateProcedure table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN in 510 msec (hmaster) 2025-05-06T17:37:47,101 [PEWorker-10] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036148, ppid=22036139, state=SUCCESS; TransitRegionStateProcedure table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65, ASSIGN in 582 msec (hmaster) 2025-05-06T17:37:47,109 [PEWorker-11] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22036139, state=SUCCESS; SplitTableRegionProcedure table=email-objects-1, parent=b85d893815b488f1282159d89e1cc86b, daughterA=942ae945978493ac3304a6eeb286a110, daughterB=2e97f125b567c33626891076fdb4f500 in 3.8930 sec // Finish reopening all regions, moving on with ModifyTableProcedure (hmaster) 2025-05-06T17:46:22,158 [PEWorker-4] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22034599, ppid=22034598, state=SUCCESS; ReopenTableRegionsProcedure table=email-objects-1 in 9 mins, 14.178 sec (hmaster) 2025-05-06T17:46:22,242 [PEWorker-2] INFO org.apache.hadoop.hbase.util.ServerRegionReplicaUtil: Region replica replication peer id=region_replica_replication not exist. Creating... (hmaster) 2025-05-06T17:46:22,246 [PEWorker-2] INFO org.apache.hadoop.hbase.master.HMaster: Client=null/null creating replication peer, id=region_replica_replication, config=clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false, state=ENABLED ... (hmaster) 2025-05-06T17:46:23,562 [PEWorker-4] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22060604, ppid=22060356, state=SUCCESS; org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure in 1.2600 sec (hmaster) 2025-05-06T17:46:23,562 [PEWorker-3] INFO org.apache.hadoop.hbase.master.replication.AddPeerProcedure: Successfully added ENABLED peer region_replica_replication, config clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false (hmaster) 2025-05-06T17:46:23,565 [PEWorker-3] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=22060356, state=SUCCESS; org.apache.hadoop.hbase.master.replication.AddPeerProcedure in 1.3140 sec // Open all region replicas, including f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65 (hmaster) 2025-05-06T17:46:23,571 [PEWorker-2] INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[...{pid=22060998, ppid=22034598, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22060999, ppid=22034598, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65, ASSIGN}...] // Some opens fail because they are already open (hmaster) 2025-05-06T17:47:32,989 [ProcExecTimeout] WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=na1-rough-still-rose.iad02.hubinternal.net,60020,1745779142214, table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7 (hmaster) 2025-05-06T17:48:32,993 [ProcExecTimeout] WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=na1-just-fast-magpie.iad02.hubinternal.net,60020,1745753381085, table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65