Hi folks,

I recently uncovered a problem in the ModifyTableProcedure that I want
to discuss with the community. I altered a table to increase the
region replica count from 1 to 2. This led to some of the secondary
replica regions being stuck opening, which was difficult to remediate,
although luckily did not impact database users. I believe that the
reason is because of an unfortunate interaction between the
ModifyTableProcedure and the SplitTableRegionProcedure.

When adding new replicas, ModifyTableProcedure roughly does the following:
1. Edits the table descriptor
2. Closes and opens all existing regions
3. Creates replication peer to support the new replica regions
4. Opens new replica regions

When closing a region in step 2, the act of closing it can trigger a
split. When the HMaster runs the SplitTableRegionProcedure, it roughly
does:
1. Close the region to be split
2. Split the store files
3. Open the new daughter regions (including replicas)

When a split is done as part of a ModifyTableProcedure that adds
replicas, we get the following combination of behavior:
1. Edit the table descriptor
2. Close and open all existing regions, adding a
SplitTableRegionProcedure to the HMaster's queue
3. To begin the SplitTableRegionProcedure, close region to be split
4. Split store files
5. Open the new daughter regions, including replicas, completing the
SplitTableRegionProcedure
6. Create replication peer to support the new replica regions
7. Open new replica regions, even though some were already opened in
step 5. Double-opening doesn't work, and the open procedures are
stuck.

For some proof, I've prepared selected log lines from the incident at
my company that prompted this email, which are at the end of this
email.

In thinking about this problem, I looked at the code that triggers
splits. This is in the MemStoreFlusher class, which is invoked on
every region close. It sometimes does a split if HRegion#checkSplit()
returns non-empty. That method will return empty if the region's
"closing" flag is set, however the order of operations of a region
close means that the closing flag is not yet set when checkSplit()
checks it. This seems like an oversight to me, and perhaps the
solution is as simple as moving the `closing.set(true)` call to
somewhere earlier in HRegion#doClose(). However, I don't understand
what the full implications of this would be.

I also thought about this from another angle: there is a span of time
when the ModifyTableProcedure has modified the table descriptor, but
has not yet opened the new replicas, and during that time, it assumes
that no other procedures will act upon the new information in the
table descriptor. In my case, the SplitTableRegionProcedure violated
this assumption. Perhaps this assumption should be made enforceable,
like somehow no region operations should be permitted by non-children
of the ModifyTableProcedure until it has completed.

What do you all think?

Thank you for your time,
Charles

Logs from the incident:
// Start of ModifyTableProcedure
(hmaster) 2025-05-06T17:37:07,884
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=60000] INFO
org.apache.hadoop.hbase.master.HMaster:
Client=HBase2TasksJobs-setReadReplication//172.16.81.147 modify table
email-objects-1 from 'email-objects-1', {...} to 'email-objects-1',
{... REGION_REPLICATION => '2',...}

// Re-open b85d893815b488f1282159d89e1cc86b to pick up new table descriptor
(hmaster) 2025-05-06T17:37:35,717 [PEWorker-2] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[...{pid=22035700, ppid=22034599,
state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
TransitRegionStateProcedure table=email-objects-1,
region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE}...]

// Close b85d893815b488f1282159d89e1cc86b
(hmaster) 2025-05-06T17:37:35,816 [PEWorker-5] INFO
org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler:
Took xlock for pid=22035700, ppid=22034599,
state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
TransitRegionStateProcedure table=email-objects-1,
region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE
(hmaster) 2025-05-06T17:37:35,817 [PEWorker-5] INFO
org.apache.hadoop.hbase.master.assignment.RegionStateStore:
pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
regionState=CLOSING,
regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
(hmaster) 2025-05-06T17:37:35,827 [PEWorker-5] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=22035776, ppid=22035700, state=RUNNABLE;
CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
(regionserver) 2025-05-06 17:37:35.986 UTC Close
b85d893815b488f1282159d89e1cc86b
(regionserver) 2025-05-06 17:37:35.986 UTC Running close preflush of
b85d893815b488f1282159d89e1cc86b
// At this point I suspect the RegionServer sent a split request to the HMaster
(regionserver) 2025-05-06 17:37:35.987 UTC Flushing
b85d893815b488f1282159d89e1cc86b 2/2 column families, dataSize=8.25 MB
heapSize=10.12 MB
(regionserver) 2025-05-06 17:37:36.057 UTC Flushed memstore data
size=7.80 MB at sequenceid=26941273 (bloomFilter=true),
to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/e/7ffadb785b8e4ed88567437cc7633e09
(regionserver) 2025-05-06 17:37:36.117 UTC Flushed memstore data
size=455.16 KB at sequenceid=26941273 (bloomFilter=true),
to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/i/ec9f55f487a443d99636f4aa6dd297d7
(hmaster) 2025-05-06T17:37:36,174 [PEWorker-9] INFO
org.apache.hadoop.hbase.master.assignment.RegionStateStore:
pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
regionState=CLOSED
(hmaster) 2025-05-06T17:37:36,182 [PEWorker-8] INFO
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure:
Starting pid=22035700, ppid=22034599,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
locked=true; TransitRegionStateProcedure table=email-objects-1,
region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE; state=CLOSED,
location=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104;
forceNewPlan=false, retain=true
(hmaster) 2025-05-06T17:37:36,182 [PEWorker-9] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22035776, ppid=22035700, state=SUCCESS; CloseRegionProcedure
b85d893815b488f1282159d89e1cc86b,
server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
in 357 msec

// Open b85d893815b488f1282159d89e1cc86b with new table descriptor
(hmaster) 2025-05-06T17:37:36,239 [PEWorker-6] INFO
org.apache.hadoop.hbase.master.assignment.RegionStateStore:
pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
regionState=OPENING,
regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
(hmaster) 2025-05-06T17:37:36,245 [PEWorker-6] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=22035800, ppid=22035700, state=RUNNABLE;
OpenRegionProcedure b85d893815b488f1282159d89e1cc86b,
server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
(hmaster) 2025-05-06T17:37:36,529 [PEWorker-10] INFO
org.apache.hadoop.hbase.master.assignment.RegionStateStore:
pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
regionState=OPEN, openSeqNum=26941277,
regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
(hmaster) 2025-05-06T17:37:36,539 [PEWorker-10] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22035800, ppid=22035700, state=SUCCESS; OpenRegionProcedure
b85d893815b488f1282159d89e1cc86b,
server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
in 290 msec
(hmaster) 2025-05-06T17:37:36,557 [PEWorker-7] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22035700, ppid=22034599, state=SUCCESS;
TransitRegionStateProcedure table=email-objects-1,
region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE in 822 msec

// The requested split of b85d893815b488f1282159d89e1cc86b begins on
the HMaster, beginning with closing the region. It works fine because
b85d893815b488f1282159d89e1cc86b is open.
(hmaster) 2025-05-06T17:37:43,225 [PEWorker-1] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=22036140, ppid=22036139,
state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
TransitRegionStateProcedure table=email-objects-1,
region=b85d893815b488f1282159d89e1cc86b, UNASSIGN}, {pid=22036141,
ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
TransitRegionStateProcedure table=email-objects-1,
region=b47e159aa157e066841e67ca1490a55d, UNASSIGN}]
(hmaster) 2025-05-06T17:37:43,231 [PEWorker-6] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=22036142, ppid=22036140, state=RUNNABLE;
CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
(hmaster) 2025-05-06T17:37:45,352 [PEWorker-1] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036142, ppid=22036140, state=SUCCESS; CloseRegionProcedure
b85d893815b488f1282159d89e1cc86b,
server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
in 2.1170 sec
(hmaster) 2025-05-06T17:37:45,355 [PEWorker-4] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036140, ppid=22036139, state=SUCCESS;
TransitRegionStateProcedure table=email-objects-1,
region=b85d893815b488f1282159d89e1cc86b, UNASSIGN in 2.1270 sec
(hmaster) 2025-05-06T17:37:45,389 [PEWorker-3] INFO
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure:
pid=22036139 splitting 7 storefiles,
region=b85d893815b488f1282159d89e1cc86b, threads=7

// The region is now split.
// Open split daughters 942ae945978493ac3304a6eeb286a110 and
2e97f125b567c33626891076fdb4f500, which is okay
// and also open daughter replicas f2edc050a8ea03ac9bef9554425b33e7
and f48f8695abfe36eba7211a8080808c65, which is not okay
(hmaster) 2025-05-06T17:37:46,515 [PEWorker-3] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=22036145, ppid=22036139,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=email-objects-1,
region=942ae945978493ac3304a6eeb286a110, ASSIGN}, {pid=22036146,
ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=email-objects-1,
region=2e97f125b567c33626891076fdb4f500, ASSIGN}, {pid=22036147,
ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=email-objects-1,
region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22036148,
ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=email-objects-1,
region=f48f8695abfe36eba7211a8080808c65, ASSIGN}]
...
(hmaster) 2025-05-06T17:37:46,960 [PEWorker-5] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036145, ppid=22036139, state=SUCCESS;
TransitRegionStateProcedure table=email-objects-1,
region=942ae945978493ac3304a6eeb286a110, ASSIGN in 443 msec
(hmaster) 2025-05-06T17:37:46,961 [PEWorker-8] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036146, ppid=22036139, state=SUCCESS;
TransitRegionStateProcedure table=email-objects-1,
region=2e97f125b567c33626891076fdb4f500, ASSIGN in 444 msec
(hmaster) 2025-05-06T17:37:47,027 [PEWorker-12] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036147, ppid=22036139, state=SUCCESS;
TransitRegionStateProcedure table=email-objects-1,
region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN in 510 msec
(hmaster) 2025-05-06T17:37:47,101 [PEWorker-10] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036148, ppid=22036139, state=SUCCESS;
TransitRegionStateProcedure table=email-objects-1,
region=f48f8695abfe36eba7211a8080808c65, ASSIGN in 582 msec
(hmaster) 2025-05-06T17:37:47,109 [PEWorker-11] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22036139, state=SUCCESS; SplitTableRegionProcedure
table=email-objects-1, parent=b85d893815b488f1282159d89e1cc86b,
daughterA=942ae945978493ac3304a6eeb286a110,
daughterB=2e97f125b567c33626891076fdb4f500 in 3.8930 sec

// Finish reopening all regions, moving on with ModifyTableProcedure
(hmaster) 2025-05-06T17:46:22,158 [PEWorker-4] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22034599, ppid=22034598, state=SUCCESS;
ReopenTableRegionsProcedure table=email-objects-1 in 9 mins, 14.178
sec
(hmaster) 2025-05-06T17:46:22,242 [PEWorker-2] INFO
org.apache.hadoop.hbase.util.ServerRegionReplicaUtil: Region replica
replication peer id=region_replica_replication not exist. Creating...
(hmaster) 2025-05-06T17:46:22,246 [PEWorker-2] INFO
org.apache.hadoop.hbase.master.HMaster: Client=null/null creating
replication peer, id=region_replica_replication,
config=clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false,
state=ENABLED
...
(hmaster) 2025-05-06T17:46:23,562 [PEWorker-4] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22060604, ppid=22060356, state=SUCCESS;
org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure in
1.2600 sec
(hmaster) 2025-05-06T17:46:23,562 [PEWorker-3] INFO
org.apache.hadoop.hbase.master.replication.AddPeerProcedure:
Successfully added ENABLED peer region_replica_replication, config
clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false
(hmaster) 2025-05-06T17:46:23,565 [PEWorker-3] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
pid=22060356, state=SUCCESS;
org.apache.hadoop.hbase.master.replication.AddPeerProcedure in 1.3140
sec

// Open all region replicas, including
f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65
(hmaster) 2025-05-06T17:46:23,571 [PEWorker-2] INFO
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[...{pid=22060998, ppid=22034598,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=email-objects-1,
region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22060999,
ppid=22034598, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
TransitRegionStateProcedure table=email-objects-1,
region=f48f8695abfe36eba7211a8080808c65, ASSIGN}...]

// Some opens fail because they are already open
(hmaster) 2025-05-06T17:47:32,989 [ProcExecTimeout] WARN
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
Region-In-Transition state=OPENING,
location=na1-rough-still-rose.iad02.hubinternal.net,60020,1745779142214,
table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7
(hmaster) 2025-05-06T17:48:32,993 [ProcExecTimeout] WARN
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
Region-In-Transition state=OPENING,
location=na1-just-fast-magpie.iad02.hubinternal.net,60020,1745753381085,
table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65

Reply via email to