[
https://issues.apache.org/jira/browse/HBASE-19839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334821#comment-16334821
]
Umesh Agashe commented on HBASE-19839:
--------------------------------------
h2.
This looks similar to downstream failure that myself and [~appy] debugged a
while back. Here is the summary of findings:
The test TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution() tests
rollback of MergeTableRegionsProcedure with double execution
(killAndTogglebeforeStore set to true). After 5 steps abort() is called which
triggers rolback of the procedure. Rollback involves submitting 2 instances of
AssignProcedure for 2 regions under consideration for merge. This is done
asynchronously i.e. MergeTableRegionsProcedure doesn’t wait for completion of
AssignProcedure instances.
The test fails on: assertEquals(true, procExec.isRunning()) in
MasterProcedureTestingUtility.testRollbackAndDoubleExecution().
When abort() of MergeTableRegionsProcedure is done, for loop in
MasterProcedureTestingUtility.testRollbackAndDoubleExecution() terminates on
condition !procExec.isFinished(procId)and thread execution proceeds to
subsequent assert statement mentioned above. If there is a thread-switch (as in
this case) then a step of one of the pending procedures is completed by
ProcedureExecutor. Just before storing a state ProcedureExecutor checks
killBeforeStore. If it is true, it stops executor. The thread executing the
test fails on assertEquals(true, procExec.isRunning()).
Based on root cause above, presence of AssignProcedure instances submitted by
MergeTableRegionsProcedure.rollback() causes this problem and fix could be:
* rollback() should wait for these instances to finish. This can be done
either by calling submitAndWait()
* making rollback() return subprocedures just like execute() and maintaining
separate rollback stack.
* Make test wait on all procedures to finish
More radical solution is to remove rollback(), as it can be done through
regular state transition or submitting child/ sub-procedure/s to the same
effect. Helper function like registerRollback(Procedure rollbackProc) can be
considered.
> Flakey TestMergeTableRegionsProcedure
> -------------------------------------
>
> Key: HBASE-19839
> URL: https://issues.apache.org/jira/browse/HBASE-19839
> Project: HBase
> Issue Type: Sub-task
> Components: flakey, test
> Reporter: stack
> Assignee: Umesh Agashe
> Priority: Major
> Fix For: 2.0.0-beta-2
>
>
> Failing about 10% of the time:
> [https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/590/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.txt]
> Its a good one. Probably an issue down deep.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)