[
https://issues.apache.org/jira/browse/HBASE-29552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081908#comment-18081908
]
Umesh Kumar Kumawat commented on HBASE-29552:
---------------------------------------------
[~qzwsx] can you help me with the affected version here ? Seems like things are
changed now.
> RegionRemoteProcedureBase inconsistent state loading caused startup failure.
> ----------------------------------------------------------------------------
>
> Key: HBASE-29552
> URL: https://issues.apache.org/jira/browse/HBASE-29552
> Project: HBase
> Issue Type: Bug
> Reporter: qazwsx
> Priority: Major
>
> Before the power failure, the Region (9bf8064aa66e5c6391bcf1d291f5e3fa) was
> performing a balance operation, which triggered the
> TransitRegionStateProcedure to execute the Move operation. Due to the fact
> that part of the in-memory data of HDFS was not persisted when the power
> failure occurred, the state of the Region recorded in the META table was
> shown as OPENING, and the Procedure record with pid=53510 was lost.
>
> After the system was started, when loadProcedure reloaded the
> CloseRegionProcedure, the transitionState operation failed, which ultimately
> led to the failure of the Master service to start.
>
> # log
> before power off:
> 2025-08-24 13:53:33,254 | INFO | master/ndp-hbase-master-1:16000.Chore.5 |
> balance hri=9bf8064aa66e5c6391bcf1d291f5e3fa,
> source=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000,
>
> destination=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
> |
> org.apache.hadoop.hbase.master.HMaster.executeRegionPlansWithThrottling(HMaster.java:1987)
> 2025-08-24 13:53:33,266 | INFO | PEWorker-10 | Initialized
> subprocedures=[\{pid=53505, ppid=53504, state=RUNNABLE; CloseRegionProcedure
> 9bf8064aa66e5c6391bcf1d291f5e3fa,
> server=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000}]
> |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1685)
> 2025-08-24 13:53:33,423 | INFO | RSProcedureDispatcher-pool-23 | Using
> KERBEROS authentication for service=AdminService, sasl=true, type='kerberos'
> | org.apache.hadoop.hbase.ipc.RpcConnection.<init>(RpcConnection.java:124)
> 2025-08-24 14:01:50,565 | INFO | PEWorker-15 | pid=53504 updating hbase:meta
> row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=CLOSED |
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
> 2025-08-24 14:01:50,569 | INFO | PEWorker-15 | Finished pid=53505,
> ppid=53504, state=SUCCESS; CloseRegionProcedure
> 9bf8064aa66e5c6391bcf1d291f5e3fa,
> server=ndp-hbase-region-0.hbaseregion.sop.svc.cluster.local,16020,1756010506000
> in 8 mins, 17.302 sec |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> 2025-08-24 14:01:50,569 | INFO | PEWorker-12 | Starting pid=53504,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true;
> TransitRegionStateProcedure table=student001,
> region=9bf8064aa66e5c6391bcf1d291f5e3fa, REOPEN/MOVE; state=CLOSED,
> location=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228;
> forceNewPlan=false, retain=false |
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.queueAssign(TransitRegionStateProcedure.java:250)
> 2025-08-24 14:01:50,720 | INFO | PEWorker-18 | pid=53504 updating hbase:meta
> row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=OPENING,
> regionLocation=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
> |
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
> 2025-08-24 14:01:50,726 | INFO | PEWorker-18 | Initialized
> subprocedures=[\{pid=53510, ppid=53504, state=RUNNABLE; OpenRegionProcedure
> 9bf8064aa66e5c6391bcf1d291f5e3fa,
> server=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228}]
> |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1685)
> 2025-08-24 14:01:51,054 | INFO | PEWorker-5 | pid=53504 updating hbase:meta
> row=9bf8064aa66e5c6391bcf1d291f5e3fa, regionState=OPEN, openSeqNum=96213,
> regionLocation=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
> |
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.createPutForRegionLocUpdate(RegionStateStore.java:253)
> 2025-08-24 14:01:51,059 | INFO | PEWorker-5 | Finished pid=53510,
> ppid=53504, state=SUCCESS; OpenRegionProcedure
> 9bf8064aa66e5c6391bcf1d291f5e3fa,
> server=ndp-hbase-region-1.hbaseregion.sop.svc.cluster.local,16020,1756010435228
> in 330 msec |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> 2025-08-24 14:01:51,060 | INFO | PEWorker-7 | Finished pid=53504,
> state=SUCCESS; TransitRegionStateProcedure table=student001,
> region=9bf8064aa66e5c6391bcf1d291f5e3fa, REOPEN/MOVE in 8 mins, 17.804 sec |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
>
> 掉电恢复启动失败
> 2025-08-24 14:58:19,266 | ERROR |
> master/ndp-hbase-master-1:16000:becomeActiveMaster | Failed to become active
> master |
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2393)
> java.lang.AssertionError:
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> [CLOSING, CLOSED] so could move to CLOSED but current state=OPENING
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.stateLoaded(RegionRemoteProcedureBase.java:290)
> at
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.stateLoaded(TransitRegionStateProcedure.java:668)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager$RegionMetaLoadingVisitor.visitRegionState(AssignmentManager.java:1879)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMetaEntry(RegionStateStore.java:153)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.access$100(RegionStateStore.java:66)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore$1.visit(RegionStateStore.java:95)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:809)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:755)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:716)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanRegions(MetaTableAccessor.java:193)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.visitMeta(RegionStateStore.java:85)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.loadMeta(AssignmentManager.java:1909)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1779)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1035)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2389)
> at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:558)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException:
> Expected [CLOSING, CLOSED] so could move to CLOSED but current state=OPENING
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateNode.transitionState(RegionStateNode.java:142)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionClosedWithoutPersistingToMeta(AssignmentManager.java:2234)
> at
> org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure.restoreSucceedState(CloseRegionProcedure.java:116)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.stateLoaded(RegionRemoteProcedureBase.java:287)
> ... 16 more
>
>
> # my question
> The entry condition for the {{RegionRemoteProcedureBase#restoreSucceedState}}
> method is
> {{{}RegionRemoteProcedureBaseState.REGION_REMOTE_PROCEDURE_REPORT_SUCCEED{}}}.
> Is it possible to skip the expected result verification when
> {{regionNode.transitionState}} is executed?
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)