[ 
https://issues.apache.org/jira/browse/HBASE-29552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081908#comment-18081908
 ] 

Umesh Kumar Kumawat commented on HBASE-29552:
---------------------------------------------

[~qzwsx] can you help me with the affected version here ? Seems like things are 
changed now. 

> RegionRemoteProcedureBase inconsistent state loading caused startup failure.
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-29552
>                 URL: https://issues.apache.org/jira/browse/HBASE-29552
>             Project: HBase
>          Issue Type: Bug
>            Reporter: qazwsx
>            Priority: Major
>
> Before the power failure, the Region (9bf8064aa66e5c6391bcf1d291f5e3fa) was 
> performing a balance operation, which triggered the 
> TransitRegionStateProcedure to execute the Move operation. Due to the fact 
> that part of the in-memory data of HDFS was not persisted when the power 
> failure occurred, the state of the Region recorded in the META table was 
> shown as OPENING, and the Procedure record with pid=53510 was lost.
>  
> After the system was started, when loadProcedure reloaded the 
> CloseRegionProcedure, the transitionState operation failed, which ultimately 
> led to the failure of the Master service to start.
>  
> # log
> before power off: 
> 2025-08-24 13:53:33,254 | INFO  | master/ndp-hbase-master-1:16000.Chore.5 | 
> balance hri=9bf8064aa66e5c6391bcf1d291f5e3fa, 
> source=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000,
>  
> destination=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
>  | 
> org.apache.hadoop.hbase.master.HMaster.executeRegionPlansWithThrottling(HMaster.java:1987)
> 2025-08-24 13:53:33,266 | INFO  | PEWorker-10 | Initialized 
> subprocedures=[\{pid=53505, ppid=53504, state=RUNNABLE; CloseRegionProcedure 
> 9bf8064aa66e5c6391bcf1d291f5e3fa, 
> server=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000}]
>  | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1685)
> 2025-08-24 13:53:33,423 | INFO  | RSProcedureDispatcher-pool-23 | Using 
> KERBEROS authentication for service=AdminService, sasl=true, type='kerberos' 
> | org.apache.hadoop.hbase.ipc.RpcConnection.<init>(RpcConnection.java:124)
> 2025-08-24 14:01:50,565 | INFO  | PEWorker-15 | pid=53504 updating hbase:meta 
> row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=CLOSED | 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
> 2025-08-24 14:01:50,569 | INFO  | PEWorker-15 | Finished pid=53505, 
> ppid=53504, state=SUCCESS; CloseRegionProcedure 
> 9bf8064aa66e5c6391bcf1d291f5e3fa, 
> server=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000
>  in 8 mins, 17.302 sec | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> 2025-08-24 14:01:50,569 | INFO  | PEWorker-12 | Starting pid=53504, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
> TransitRegionStateProcedure table=student001, 
> region=9bf8064aa66e5c6391bcf1d291f5e3fa, REOPEN/MOVE; state=CLOSED, 
> location=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228;
>  forceNewPlan=false, retain=false | 
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.queueAssign(TransitRegionStateProcedure.java:250)
> 2025-08-24 14:01:50,720 | INFO  | PEWorker-18 | pid=53504 updating hbase:meta 
> row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=OPENING, 
> regionLocation=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
>  | 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
> 2025-08-24 14:01:50,726 | INFO  | PEWorker-18 | Initialized 
> subprocedures=[\{pid=53510, ppid=53504, state=RUNNABLE; OpenRegionProcedure 
> 9bf8064aa66e5c6391bcf1d291f5e3fa, 
> server=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228}]
>  | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1685)
> 2025-08-24 14:01:51,054 | INFO  | PEWorker-5 | pid=53504 updating hbase:meta 
> row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=OPEN, openSeqNum=96213, 
> regionLocation=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
>  | 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
> 2025-08-24 14:01:51,059 | INFO  | PEWorker-5 | Finished pid=53510, 
> ppid=53504, state=SUCCESS; OpenRegionProcedure 
> 9bf8064aa66e5c6391bcf1d291f5e3fa, 
> server=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
>  in 330 msec | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> 2025-08-24 14:01:51,060 | INFO  | PEWorker-7 | Finished pid=53504, 
> state=SUCCESS; TransitRegionStateProcedure table=student001, 
> region=9bf8064aa66e5c6391bcf1d291f5e3fa, REOPEN/MOVE in 8 mins, 17.804 sec | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
>  
> 掉电恢复启动失败
> 2025-08-24 14:58:19,266 | ERROR | 
> master/ndp-hbase-master-1:16000:becomeActiveMaster | Failed to become active 
> master | 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2393)
> java.lang.AssertionError: 
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [CLOSING, CLOSED] so could move to CLOSED but current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.stateLoaded(RegionRemoteProcedureBase.java:290)
> at 
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.stateLoaded(TransitRegionStateProcedure.java:668)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager$RegionMetaLoadingVisitor.visitRegionState(AssignmentManager.java:1879)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMetaEntry(RegionStateStore.java:153)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.access$100(RegionStateStore.java:66)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore$1.visit(RegionStateStore.java:95)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:809)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:755)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:716)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanRegions(MetaTableAccessor.java:193)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMeta(RegionStateStore.java:85)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.loadMeta(AssignmentManager.java:1909)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1779)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1035)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2389)
> at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:558)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> Expected [CLOSING, CLOSED] so could move to CLOSED but current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStateNode.transitionState(RegionStateNode.java:142)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionClosedWithoutPersistingToMeta(AssignmentManager.java:2234)
> at 
> org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure.restoreSucceedState(CloseRegionProcedure.java:116)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.stateLoaded(RegionRemoteProcedureBase.java:287)
> ... 16 more
>  
>  
> # my question
> The entry condition for the {{RegionRemoteProcedureBase#restoreSucceedState}} 
> method is 
> {{{}RegionRemoteProcedureBaseState.REGION_REMOTE_PROCEDURE_REPORT_SUCCEED{}}}.
>  Is it possible to skip the expected result verification when 
> {{regionNode.transitionState}} is executed?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to