[ 
https://issues.apache.org/jira/browse/HBASE-28023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-28023.
-------------------------------
    Fix Version/s: 2.5.11
                   2.6.2
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to all active branches.

Thanks [~luoen] for contributing!

> ITBLL's RollingBatchSuspendResumeRsAction runs the "suspendRs" method to 
> perform the action, but it inadvertently uses the "waitForRegionServerToStop" 
> method to check if it was executed successfully.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28023
>                 URL: https://issues.apache.org/jira/browse/HBASE-28023
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha-1, 2.7.0
>            Reporter: Haiping lv
>            Assignee: Haiping lv
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2
>
>
> When running ITBLL, a problem occurs that ultimately results in all region 
> servers being suspended.
> The following is the ITBLL running command:
> {code:java}
> hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList 
> -DIntegrationTestBigLinkedList.table=itbll -m slowDeterministic Loop 10 10 
> 10000000 /tmp/biglinkedlist 100 {code}
> I have summarized the process as follows:
>  # The Action RollingBatchSuspendResumeRsAction in ITBLL will execute the 
> "sudo -u hbase ps ux | grep proc_regionserver | grep -v grep | tr -s ' ' | 
> cut -d ' ' -f2 | xargs kill -s SIGSTOP" command to suspend the RegionServer 
> process.
>  # This command will pause the RegionServer process, rather than kill it.
>  # The Action uses the waitForServiceToStop method to check if the execution 
> was successful, using the "sudo -u hbase ps ux | grep proc_regionserver | 
> grep -v grep | tr -s ' ' | cut -d ' ' -f2" command.
>  # The waitForServiceToStop method used to check if the execution was 
> successful does not match the suspendRs, causing ITBLL to not resume the 
> RegionServer process and ultimately resulting in all RegionServer processes 
> being suspended. Therefore, ITBLL fails to run.
> {code:java}
> 2023-07-21 11:18:23,103 WARN  [ChaosMonkey-2] policies.Policy 
> (DoActionsOncePolicy.java:runOneIteration(51)) - Exception occurred during 
> performing action: java.io.IOException: Timed-out waiting for service to 
> stop: core-1-3,16020,1689908619650
>         at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.waitForServiceToStop(DistributedHBaseCluster.java:282)
>         at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.waitForRegionServerToStop(DistributedHBaseCluster.java:131)
>         at 
> org.apache.hadoop.hbase.chaos.actions.Action.suspendRs(Action.java:200)
>         at 
> org.apache.hadoop.hbase.chaos.actions.RollingBatchSuspendResumeRsAction.perform(RollingBatchSuspendResumeRsAction.java:97)
>         at 
> org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:48)
>         at 
> org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
>         at 
> org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to