[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658121#comment-16658121
 ] 

Duo Zhang commented on HBASE-21334:
-----------------------------------

OK I think this is test issue. For MergeTableRegionsProcedure and 
SplitTableRegionProcedure, we will schedule TRSPs to bring the region online, 
and since the MergeTableRegionsProcedure or SplitTableRegionProcedure still 
holds the lock when rolling back, the TRSPs can only be executed after the 
rollback is finished, and since we have set kill after every step so these 
TRSPs may also be effected.

We have a piece of code in MasterProcedureTestingUtility to deal with this but 
obviously it does not always work...

{code}
    if (waitForAsyncProcs) {
      // Sometimes there are other procedures still executing (including 
asynchronously spawned by
      // procId) and due to KillAndToggleBeforeStoreUpdate flag 
ProcedureExecutor is stopped before
      // store update. Let all pending procedures finish normally.
      if (!procExec.isRunning()) {
        LOG.warn("ProcedureExecutor not running, may have been stopped by 
pending procedure due to"
            + " KillAndToggleBeforeStoreUpdate flag.");
        ProcedureTestingUtility.setKillAndToggleBeforeStoreUpdate(procExec, 
false);
        restartMasterProcedureExecutor(procExec);
        ProcedureTestingUtility.waitNoProcedureRunning(procExec);
      }
    }
{code}

Let me think how to make it more stable...

> TestMergeTableRegionsProcedure is flakey
> ----------------------------------------
>
>                 Key: HBASE-21334
>                 URL: https://issues.apache.org/jira/browse/HBASE-21334
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2, proc-v2
>            Reporter: Duo Zhang
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>       at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to