[ 
https://issues.apache.org/jira/browse/HBASE-29139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Yang updated HBASE-29139:
------------------------------
    Description: 
When trying to Decommission RegionServer with `sh 
/home/hbase/bin/graceful_stop.sh --maxthreads 32 --nobalancer localhost`, the 
process of moving regions would stuck with errors:
{code:java}
2025-02-18 11:33:56,999 ERROR [pool-6-thread-23] util.MoveWithAck: Region: 
ns1:test1,1014357|2021-08-28 
00:17:49.343,1678468120886.d1c541166fc845ccd5429eb75265f5ee. stuck on 
rserver1.test.com,16020,1739270418124 for 64.199 sec , 
newServer=rserver2.test.com,16020,1739273909154
2025-02-18 11:34:02,421 ERROR [pool-4-thread-1] util.RegionMover: Was Not able 
to move region....Exiting Now
2025-02-18 11:34:02,422 ERROR [pool-4-thread-1] util.RegionMover: Error while 
unloading regions 
java.lang.Exception: Could not move region Exception
        at 
org.apache.hadoop.hbase.util.RegionMover.waitMoveTasksToFinish(RegionMover.java:548)
        at 
org.apache.hadoop.hbase.util.RegionMover.submitRegionMovesWhileUnloading(RegionMover.java:506)
        at 
org.apache.hadoop.hbase.util.RegionMover.unloadRegions(RegionMover.java:482)
        at 
org.apache.hadoop.hbase.util.RegionMover.lambda$unloadRegions$3(RegionMover.java:449)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
The RS under decommission belongs to rsgroup A, but the RegionMover is trying 
to move the regions to the RS belongs to rsgroup B. So it stucks. 
getTargetServer() should filter the rsgroup in this situation.

  was:
When trying to Decommission RegionServer with `sh 
/home/hbase/bin/graceful_stop.sh --maxthreads 32 --nobalancer localhost`, the 
process of moving regions would stuck with errors:
{code:java}
2025-02-18 11:33:56,999 ERROR [pool-6-thread-23] util.MoveWithAck: Region: 
ns1:test1,1014357|2021-08-28 
00:17:49.343,1678468120886.d1c541166fc845ccd5429eb75265f5ee. stuck on 
rserver1.test.com,16020,1739270418124 for 64.199 sec , 
newServer=rserver2.test.com,16020,1739273909154
2025-02-18 11:34:02,421 ERROR [pool-4-thread-1] util.RegionMover: Was Not able 
to move region....Exiting Now
2025-02-18 11:34:02,422 ERROR [pool-4-thread-1] util.RegionMover: Error while 
unloading regions 
java.lang.Exception: Could not move region Exception
        at 
org.apache.hadoop.hbase.util.RegionMover.waitMoveTasksToFinish(RegionMover.java:548)
        at 
org.apache.hadoop.hbase.util.RegionMover.submitRegionMovesWhileUnloading(RegionMover.java:506)
        at 
org.apache.hadoop.hbase.util.RegionMover.unloadRegions(RegionMover.java:482)
        at 
org.apache.hadoop.hbase.util.RegionMover.lambda$unloadRegions$3(RegionMover.java:449)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}


> RegionMover does not consider the RS Group when selecting the target 
> RegionServer during the loadRegions() process.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29139
>                 URL: https://issues.apache.org/jira/browse/HBASE-29139
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.4.13
>            Reporter: Jack Yang
>            Assignee: Jack Yang
>            Priority: Minor
>
> When trying to Decommission RegionServer with `sh 
> /home/hbase/bin/graceful_stop.sh --maxthreads 32 --nobalancer localhost`, the 
> process of moving regions would stuck with errors:
> {code:java}
> 2025-02-18 11:33:56,999 ERROR [pool-6-thread-23] util.MoveWithAck: Region: 
> ns1:test1,1014357|2021-08-28 
> 00:17:49.343,1678468120886.d1c541166fc845ccd5429eb75265f5ee. stuck on 
> rserver1.test.com,16020,1739270418124 for 64.199 sec , 
> newServer=rserver2.test.com,16020,1739273909154
> 2025-02-18 11:34:02,421 ERROR [pool-4-thread-1] util.RegionMover: Was Not 
> able to move region....Exiting Now
> 2025-02-18 11:34:02,422 ERROR [pool-4-thread-1] util.RegionMover: Error while 
> unloading regions 
> java.lang.Exception: Could not move region Exception
>         at 
> org.apache.hadoop.hbase.util.RegionMover.waitMoveTasksToFinish(RegionMover.java:548)
>         at 
> org.apache.hadoop.hbase.util.RegionMover.submitRegionMovesWhileUnloading(RegionMover.java:506)
>         at 
> org.apache.hadoop.hbase.util.RegionMover.unloadRegions(RegionMover.java:482)
>         at 
> org.apache.hadoop.hbase.util.RegionMover.lambda$unloadRegions$3(RegionMover.java:449)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> The RS under decommission belongs to rsgroup A, but the RegionMover is trying 
> to move the regions to the RS belongs to rsgroup B. So it stucks. 
> getTargetServer() should filter the rsgroup in this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to