[ 
https://issues.apache.org/jira/browse/HBASE-22923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370820#comment-17370820
 ] 

Viraj Jasani commented on HBASE-22923:
--------------------------------------

Here is what can reproduce this easily (and it can happen unknowingly in 
production):
 # Create 'system' RSGroup. Move all system tables to this RSGroup.
 # Now we have system tables in system RSGroup and all other tables in default 
RSGroup.
 # Bring up new RegionServer on higher version. Since it's IP address is not 
yet known to master, it will be added to 'default' RSGroup by default (or let's 
say unknowingly one of default RSGroup's RegionServer is restarted and brought 
to higher version during rolling upgrade).
 # One dedicated thread in AM will try to assign meta (and other system tables) 
to newly brought RS with higher version but will fail to bring it online 
because newly brought up RS is not under jurisdiction of system RSGroup. And we 
will get the same error as reported by [~wenbang] as per Jira description.

 

Just had a quick glance, and it seems that we have purposefully kept 
RSGroupInfoManager (together with it's coproc endpoint) away from hbase-server 
(hbase-rsgroup not reachable) in branch-1 and branch-2, and it is again moved 
back to hbase-server module in trunk.

[~zhangduo] [~zghao] does this mean branch-1 and 2 have no way for AM to 
interact with RSGroup APIs directly? If we have a way, it would be better for 
AM to identify that any RS brought up with higher version than rest of RS 
should belong to 'system' RSGroup (for meta to move) and if it doesn't belong 
to system RSGroup, then do not move meta region to it regardless of the version 
difference.

[~anoop.hbase]

> hbase:meta is assigned to localhost when we downgrade the hbase version
> -----------------------------------------------------------------------
>
>                 Key: HBASE-22923
>                 URL: https://issues.apache.org/jira/browse/HBASE-22923
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.8
>            Reporter: wenbang
>            Priority: Major
>
> When we downgrade the hbase version(rsgroup enable), we found that the 
> hbase:meta table could not be assigned.
> {code:java}
> master.AssignmentManager: Failed assignment of hbase:meta,,1.1588230740 to 
> localhost,1,1, trying to assign elsewhere instead; try=1 of 10 
> java.io.IOException: Call to localhost/127.0.0.1:1 failed on local exception: 
> org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the 
> failed servers list: localhost/127.0.0.1:1
> {code}
> hbase group list:
>   HBASE_META group(hbase:meta and other system tables)
>   default group
> 1.Down grade all servers in HBASE_META first
> 2.higher version servers is in default
> 3.hbase:meta assigned to localhost
> For system table, we assign them to a server with highest version.
> AssignmentManager#getExcludedServersForSystemTable
> But did not consider the rsgroup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to