[jira] [Commented] (HBASE-21278) TestMergeTableRegionsProcedure is flaky

Duo Zhang (JIRA) Wed, 10 Oct 2018 21:18:22 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-21278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645951#comment-16645951
 ]


Duo Zhang commented on HBASE-21278:
-----------------------------------

OK I found the problem. When rolling back a procedure, we also need to acquire 
the lock. And for TRSP, we need to wait until meta loaded, but when the 
procedures is woken up by the meta loaded event, we will add it into the 
scheduler and try to execute it, not rollback it...

I think the first thing here is to decide what is the correct behavior. In the 
current design, when rolling back a procedure, we will rollback the sub 
procedures first. At least for MergeTableRegionsProcedure, this does not make 
sense. There is no rollback for TRSP, and also, we will schedule new TRSPs to 
rollback the state when rolling back the MergeTableRegionsProcedure, so we do 
not need to rollback the sub procedures...



> TestMergeTableRegionsProcedure is flaky
> ---------------------------------------
>
>                 Key: HBASE-21278
>                 URL: https://issues.apache.org/jira/browse/HBASE-21278
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> https://builds.apache.org/job/HBase-Flaky-Tests/job/master/1235/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt/*view*/
> I think the problem is
> {noformat}
> 2018-10-08 03:44:30,315 INFO  [PEWorker-1] 
> procedure.MasterProcedureScheduler(689): pid=43, ppid=42, state=SUCCESS, 
> hasLock=false; TransitRegionStateProcedure 
> table=testRollbackAndDoubleExecution, 
> region=9bac7c539ac0cff6dc5706ed375a3bfb, UNASSIGN checking lock on 
> 9bac7c539ac0cff6dc5706ed375a3bfb
> 2018-10-08 03:44:30,320 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): 
> CODE-BUG: Uncaught runtime exception for pid=43, ppid=42, state=SUCCESS, 
> hasLock=true; TransitRegionStateProcedure 
> table=testRollbackAndDoubleExecution, 
> region=9bac7c539ac0cff6dc5706ed375a3bfb, UNASSIGN
> java.lang.UnsupportedOperationException
>       at 
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.rollbackState(TransitRegionStateProcedure.java:458)
>       at 
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.rollbackState(TransitRegionStateProcedure.java:97)
>       at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
>       at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:957)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1605)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1567)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1446)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
> {noformat}
> Typically there is no rollback for TRSP. Need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21278) TestMergeTableRegionsProcedure is flaky

Reply via email to