[
https://issues.apache.org/jira/browse/HBASE-21278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645951#comment-16645951
]
Duo Zhang commented on HBASE-21278:
-----------------------------------
OK I found the problem. When rolling back a procedure, we also need to acquire
the lock. And for TRSP, we need to wait until meta loaded, but when the
procedures is woken up by the meta loaded event, we will add it into the
scheduler and try to execute it, not rollback it...
I think the first thing here is to decide what is the correct behavior. In the
current design, when rolling back a procedure, we will rollback the sub
procedures first. At least for MergeTableRegionsProcedure, this does not make
sense. There is no rollback for TRSP, and also, we will schedule new TRSPs to
rollback the state when rolling back the MergeTableRegionsProcedure, so we do
not need to rollback the sub procedures...
> TestMergeTableRegionsProcedure is flaky
> ---------------------------------------
>
> Key: HBASE-21278
> URL: https://issues.apache.org/jira/browse/HBASE-21278
> Project: HBase
> Issue Type: Bug
> Reporter: Duo Zhang
> Priority: Major
> Attachments:
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> https://builds.apache.org/job/HBase-Flaky-Tests/job/master/1235/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt/*view*/
> I think the problem is
> {noformat}
> 2018-10-08 03:44:30,315 INFO [PEWorker-1]
> procedure.MasterProcedureScheduler(689): pid=43, ppid=42, state=SUCCESS,
> hasLock=false; TransitRegionStateProcedure
> table=testRollbackAndDoubleExecution,
> region=9bac7c539ac0cff6dc5706ed375a3bfb, UNASSIGN checking lock on
> 9bac7c539ac0cff6dc5706ed375a3bfb
> 2018-10-08 03:44:30,320 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159):
> CODE-BUG: Uncaught runtime exception for pid=43, ppid=42, state=SUCCESS,
> hasLock=true; TransitRegionStateProcedure
> table=testRollbackAndDoubleExecution,
> region=9bac7c539ac0cff6dc5706ed375a3bfb, UNASSIGN
> java.lang.UnsupportedOperationException
> at
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.rollbackState(TransitRegionStateProcedure.java:458)
> at
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.rollbackState(TransitRegionStateProcedure.java:97)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:957)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1605)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1567)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1446)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
> {noformat}
> Typically there is no rollback for TRSP. Need to dig more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)