[ 
https://issues.apache.org/jira/browse/HBASE-22767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-22767:
-------------------------------
    Comment: was deleted

(was: Here version stands for package version. System tables will always be 
assigned to servers of highest version, but this design will not work in 
rsgroup, because if system tables are isolated in a rsgroup without  highest 
version servers, there will be RIT stucks of their regions.

I approve of removing BOGUS server, but we still need to find a server for 
regions whose group has no online servers. [~zghaobac] suggested to randomly 
choose servers from DEFAULT group. But still some works of HBASE-22514 need to 
be done before implement this idea ? Or we need to fix this issue on other 
branches? )

> System table RIT STUCK if their RSGroup has no highest version RSes
> -------------------------------------------------------------------
>
>                 Key: HBASE-22767
>                 URL: https://issues.apache.org/jira/browse/HBASE-22767
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>
> AM chooses highest version region servers as participants for system tables, 
> including META table. If system table group has no highest version region 
> servers, then the reassignment of their regions will be always the BOGUS 
> server defined in RSGroup. 
> In our test environment using branch-2.2, we isolate system tables in a 
> rsgroup containing only one server. And when upgrading RSs, we have met the 
> problem that META is always assigned to the BOGUS server while the group 
> server has already been online for a while. META RIT is stuck and can not be 
> recovered by hbck2.
> I made a UT reproduce this problem, steps are:
> 1. add a group, move 1 server to it;
> 2. move meta table to the group;
> 3. restart the group server and downgrade its version;
> 4. meta rit stuck.
>  
> ROOT cause is AM filters highest version RSs for system tables. So if we do 
> not change the versions of system table group servers, but upgrade the 
> versions of other group servers, then if there is reassignment for any system 
> tables, such as balancer moving their regions, RIT STUCK!! 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to