[
https://issues.apache.org/jira/browse/HBASE-21278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646035#comment-16646035
]
Duo Zhang commented on HBASE-21278:
-----------------------------------
I think there are two scenarios which we want to rollback a procedure.
1. The procedure is aborted.
2. One of the sub procedure is failed.
I think the proper way to rollback a procedure is:
1. If there are still running sub procedures, wait until they are all done.
2. Rollback this procedure.
3. Recursively rollback the parent procedure.
For now the logic is almost the same as above, but we have a complicated way to
store the rollback steps, where we will record the execution of sub procedures,
and will also rollback the sub procedures when rolling back a procedure.
Let me review the code carefully to see what is going on...
> TestMergeTableRegionsProcedure is flaky
> ---------------------------------------
>
> Key: HBASE-21278
> URL: https://issues.apache.org/jira/browse/HBASE-21278
> Project: HBase
> Issue Type: Bug
> Reporter: Duo Zhang
> Priority: Major
> Attachments:
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> https://builds.apache.org/job/HBase-Flaky-Tests/job/master/1235/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt/*view*/
> I think the problem is
> {noformat}
> 2018-10-08 03:44:30,315 INFO [PEWorker-1]
> procedure.MasterProcedureScheduler(689): pid=43, ppid=42, state=SUCCESS,
> hasLock=false; TransitRegionStateProcedure
> table=testRollbackAndDoubleExecution,
> region=9bac7c539ac0cff6dc5706ed375a3bfb, UNASSIGN checking lock on
> 9bac7c539ac0cff6dc5706ed375a3bfb
> 2018-10-08 03:44:30,320 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159):
> CODE-BUG: Uncaught runtime exception for pid=43, ppid=42, state=SUCCESS,
> hasLock=true; TransitRegionStateProcedure
> table=testRollbackAndDoubleExecution,
> region=9bac7c539ac0cff6dc5706ed375a3bfb, UNASSIGN
> java.lang.UnsupportedOperationException
> at
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.rollbackState(TransitRegionStateProcedure.java:458)
> at
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.rollbackState(TransitRegionStateProcedure.java:97)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:957)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1605)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1567)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1446)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
> {noformat}
> Typically there is no rollback for TRSP. Need to dig more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)