[
https://issues.apache.org/jira/browse/HBASE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248737#comment-17248737
]
Xiaolin Ha commented on HBASE-25334:
------------------------------------
I run branch-2 TestRSGroupsFallback.testFallback locally, it failed
occasionally.
I added an assertion to verify my idea for balancer, it failed as I thought.
!1607918235175-image.png|width=569,height=281!
I think it fails root cause maybe that, the RSGroupBasedLoadBalaner corrects
table assignments by group info which is cached, but when a server is online,
the cached is updated by a listener thread.
I pushed a new PR for branch-2, passed locally 20 times.
[~stack] Could you help to review and test it? Thanks.
> TestRSGroupsFallback.testFallback is flaky
> ------------------------------------------
>
> Key: HBASE-25334
> URL: https://issues.apache.org/jira/browse/HBASE-25334
> Project: HBase
> Issue Type: Test
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
> Attachments: 1607918235175-image.png,
> image-2020-12-13-10-15-55-445.png
>
>
> Like in CI test results of PR [https://github.com/apache/hbase/pull/2699]
> failed UTs site is
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2699/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt]
>
> In this unit test, it checks if all table regions assigned after balance, and
> then assert for the RS group of regions.
> But balance() uses aync move, and will throttle move regions, sleeping
> between all the table regions are moved to its RSGroup.
> If waiting time is not longer than the region movement duration, the
> assertion will be fail.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)