[
https://issues.apache.org/jira/browse/HBASE-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577508#comment-16577508
]
Duo Zhang commented on HBASE-20881:
-----------------------------------
Looped 100 times locally and it finally failed with
{noformat}
2018-08-12 19:57:18,174 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159):
CODE-BUG: Uncaught runtime exception for pid=83,
state=FAILED:SPLIT_TABLE_REGION_UPDATE_META, hasLock=true,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
Max attempts 10 exceeded; SplitTableRegionProcedure
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4,
daughterA=de5ab31764b272230cb50ca31b8ecbdb,
daughterB=0d190bf20801e4bb12d6aaf40e971340
java.lang.UnsupportedOperationException: pid=83,
state=FAILED:SPLIT_TABLE_REGION_PRE_OPERATION_AFTER_META, hasLock=true,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
Max attempts 10 exceeded; SplitTableRegionProcedure
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4,
daughterA=de5ab31764b272230cb50ca31b8ecbdb,
daughterB=0d190bf20801e4bb12d6aaf40e971340 unhandled
state=SPLIT_TABLE_REGION_PRE_OPERATION_AFTER_META
at
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:320)
at
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:1)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
at
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:886)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1436)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1392)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1270)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$7(ProcedureExecutor.java:1251)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1822)
2018-08-12 19:57:18,174 WARN [PEWorker-1]
procedure2.ProcedureExecutor$Testing(98): Toggle KILL before store update to:
true
2018-08-12 19:57:18,192 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159):
CODE-BUG: Uncaught runtime exception for pid=83,
state=FAILED:SPLIT_TABLE_REGION_PRE_OPERATION_BEFORE_META, hasLock=true,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
Max attempts 10 exceeded; SplitTableRegionProcedure
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4,
daughterA=de5ab31764b272230cb50ca31b8ecbdb,
daughterB=0d190bf20801e4bb12d6aaf40e971340
java.lang.UnsupportedOperationException: pid=83,
state=FAILED:SPLIT_TABLE_REGION_UPDATE_META, hasLock=true,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
Max attempts 10 exceeded; SplitTableRegionProcedure
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4,
daughterA=de5ab31764b272230cb50ca31b8ecbdb,
daughterB=0d190bf20801e4bb12d6aaf40e971340 unhandled
state=SPLIT_TABLE_REGION_UPDATE_META
at
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:320)
at
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:1)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
at
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:886)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1436)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1392)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1270)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$7(ProcedureExecutor.java:1251)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1822)
{noformat}
Let me dig more.
> Introduce a region transition procedure to handle all the state transition
> for a region
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-20881
> URL: https://issues.apache.org/jira/browse/HBASE-20881
> Project: HBase
> Issue Type: Sub-task
> Components: amv2, proc-v2
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20881-v1.patch, HBASE-20881-v2.patch,
> HBASE-20881-v3.patch, HBASE-20881-v4.patch, HBASE-20881-v4.patch,
> HBASE-20881-v5.patch, HBASE-20881-v6.patch, HBASE-20881-v7.patch,
> HBASE-20881-v7.patch, HBASE-20881.patch
>
>
> Now have an AssignProcedure, an UnssignProcedure, and also a
> MoveRegionProcedure which schedules an AssignProcedure and an
> UnssignProcedure to move a region. This makes the logic a bit complicated, as
> MRP is not a RIT, so when SCP can not interrupt it directly...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)