qazwsx created HBASE-29552:
------------------------------

             Summary: RegionRemoteProcedureBase inconsistent state loading 
caused startup failure.
                 Key: HBASE-29552
                 URL: https://issues.apache.org/jira/browse/HBASE-29552
             Project: HBase
          Issue Type: Bug
            Reporter: qazwsx


Before the power failure, the Region (9bf8064aa66e5c6391bcf1d291f5e3fa) was 
performing a balance operation, which triggered the TransitRegionStateProcedure 
to execute the Move operation. Due to the fact that part of the in-memory data 
of HDFS was not persisted when the power failure occurred, the state of the 
Region recorded in the META table was shown as OPENING, and the Procedure 
record with pid=53510 was lost.
 
After the system was started, when loadProcedure reloaded the 
CloseRegionProcedure, the transitionState operation failed, which ultimately 
led to the failure of the Master service to start.
 
# log
before power off: 
2025-08-24 13:53:33,254 | INFO  | master/ndp-hbase-master-1:16000.Chore.5 | 
balance hri=9bf8064aa66e5c6391bcf1d291f5e3fa, 
source=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000,
 
destination=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
 | 
org.apache.hadoop.hbase.master.HMaster.executeRegionPlansWithThrottling(HMaster.java:1987)
2025-08-24 13:53:33,266 | INFO  | PEWorker-10 | Initialized 
subprocedures=[\{pid=53505, ppid=53504, state=RUNNABLE; CloseRegionProcedure 
9bf8064aa66e5c6391bcf1d291f5e3fa, 
server=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000}]
 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1685)
2025-08-24 13:53:33,423 | INFO  | RSProcedureDispatcher-pool-23 | Using 
KERBEROS authentication for service=AdminService, sasl=true, type='kerberos' | 
org.apache.hadoop.hbase.ipc.RpcConnection.<init>(RpcConnection.java:124)
2025-08-24 14:01:50,565 | INFO  | PEWorker-15 | pid=53504 updating hbase:meta 
row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=CLOSED | 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
2025-08-24 14:01:50,569 | INFO  | PEWorker-15 | Finished pid=53505, ppid=53504, 
state=SUCCESS; CloseRegionProcedure 9bf8064aa66e5c6391bcf1d291f5e3fa, 
server=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000 
in 8 mins, 17.302 sec | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
2025-08-24 14:01:50,569 | INFO  | PEWorker-12 | Starting pid=53504, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
TransitRegionStateProcedure table=student001, 
region=9bf8064aa66e5c6391bcf1d291f5e3fa, REOPEN/MOVE; state=CLOSED, 
location=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228;
 forceNewPlan=false, retain=false | 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.queueAssign(TransitRegionStateProcedure.java:250)
2025-08-24 14:01:50,720 | INFO  | PEWorker-18 | pid=53504 updating hbase:meta 
row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=OPENING, 
regionLocation=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
 | 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
2025-08-24 14:01:50,726 | INFO  | PEWorker-18 | Initialized 
subprocedures=[\{pid=53510, ppid=53504, state=RUNNABLE; OpenRegionProcedure 
9bf8064aa66e5c6391bcf1d291f5e3fa, 
server=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228}]
 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1685)
2025-08-24 14:01:51,054 | INFO  | PEWorker-5 | pid=53504 updating hbase:meta 
row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=OPEN, openSeqNum=96213, 
regionLocation=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
 | 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
2025-08-24 14:01:51,059 | INFO  | PEWorker-5 | Finished pid=53510, ppid=53504, 
state=SUCCESS; OpenRegionProcedure 9bf8064aa66e5c6391bcf1d291f5e3fa, 
server=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228 
in 330 msec | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
2025-08-24 14:01:51,060 | INFO  | PEWorker-7 | Finished pid=53504, 
state=SUCCESS; TransitRegionStateProcedure table=student001, 
region=9bf8064aa66e5c6391bcf1d291f5e3fa, REOPEN/MOVE in 8 mins, 17.804 sec | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
 
掉电恢复启动失败
2025-08-24 14:58:19,266 | ERROR | 
master/ndp-hbase-master-1:16000:becomeActiveMaster | Failed to become active 
master | 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2393)
java.lang.AssertionError: 
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected [CLOSING, 
CLOSED] so could move to CLOSED but current state=OPENING
at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.stateLoaded(RegionRemoteProcedureBase.java:290)
at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.stateLoaded(TransitRegionStateProcedure.java:668)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager$RegionMetaLoadingVisitor.visitRegionState(AssignmentManager.java:1879)
at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMetaEntry(RegionStateStore.java:153)
at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.access$100(RegionStateStore.java:66)
at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore$1.visit(RegionStateStore.java:95)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:809)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:755)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:716)
at 
org.apache.hadoop.hbase.MetaTableAccessor.fullScanRegions(MetaTableAccessor.java:193)
at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMeta(RegionStateStore.java:85)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.loadMeta(AssignmentManager.java:1909)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1779)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1035)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2389)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:558)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
Expected [CLOSING, CLOSED] so could move to CLOSED but current state=OPENING
at 
org.apache.hadoop.hbase.master.assignment.RegionStateNode.transitionState(RegionStateNode.java:142)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionClosedWithoutPersistingToMeta(AssignmentManager.java:2234)
at 
org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure.restoreSucceedState(CloseRegionProcedure.java:116)
at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.stateLoaded(RegionRemoteProcedureBase.java:287)
... 16 more
 
 
# my question
The entry condition for the {{RegionRemoteProcedureBase#restoreSucceedState}} 
method is 
{{{}RegionRemoteProcedureBaseState.REGION_REMOTE_PROCEDURE_REPORT_SUCCEED{}}}. 
Is it possible to skip the expected result verification when 
{{regionNode.transitionState}} is executed?
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to