[ 
https://issues.apache.org/jira/browse/HBASE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248737#comment-17248737
 ] 

Xiaolin Ha commented on HBASE-25334:
------------------------------------

I run branch-2 TestRSGroupsFallback.testFallback locally, it failed 
occasionally.

I added an assertion to verify my idea for balancer, it failed as I thought.

!1607918235175-image.png|width=569,height=281!

I think it fails root cause maybe that, the RSGroupBasedLoadBalaner corrects 
table assignments by group info which is cached, but when a server is online, 
the cached is updated by a listener thread.

I pushed a new PR for branch-2, passed locally 20 times.

[~stack] Could you help to review and test it? Thanks.

 

> TestRSGroupsFallback.testFallback is flaky
> ------------------------------------------
>
>                 Key: HBASE-25334
>                 URL: https://issues.apache.org/jira/browse/HBASE-25334
>             Project: HBase
>          Issue Type: Test
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: 1607918235175-image.png, 
> image-2020-12-13-10-15-55-445.png
>
>
> Like in CI test results of PR [https://github.com/apache/hbase/pull/2699]
> failed UTs site is 
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2699/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt]
>  
> In this unit test, it checks if all table regions assigned after balance, and 
> then assert for the RS group of regions.
> But balance() uses aync move, and will throttle move regions, sleeping 
> between all the table regions are moved to its RSGroup.
> If waiting time is not longer than the region movement duration, the 
> assertion will be fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to