[
https://issues.apache.org/jira/browse/HBASE-22767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898054#comment-16898054
]
Guanghao Zhang commented on HBASE-22767:
----------------------------------------
I thought the better fix should fallback to default group to find a RS... No
need to add a BOGUS_SERVER_NAME and no need to fix these special cases.
> System table RIT STUCK if their RSGroup has no highest version RSes
> -------------------------------------------------------------------
>
> Key: HBASE-22767
> URL: https://issues.apache.org/jira/browse/HBASE-22767
> Project: HBase
> Issue Type: Bug
> Components: rsgroup
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
>
> AM chooses highest version region servers as participants for system tables,
> including META table. If system table group has no highest version region
> servers, then the reassignment of their regions will be always the BOGUS
> server defined in RSGroup.
> In our test environment using branch-2.2, we isolate system tables in a
> rsgroup containing only one server. And when upgrading RSs, we have met the
> problem that META is always assigned to the BOGUS server while the group
> server has already been online for a while. META RIT is stuck and can not be
> recovered by hbck2.
> I made a UT reproduce this problem, steps are:
> 1. add a group, move 1 server to it;
> 2. move meta table to the group;
> 3. restart the group server and downgrade its version;
> 4. meta rit stuck.
>
> ROOT cause is AM filters highest version RSs for system tables. So if we do
> not change the versions of system table group servers, but upgrade the
> versions of other group servers, then if there is reassignment for any system
> tables, such as balancer moving their regions, RIT STUCK!!
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)