Bryan Beaudreault created HBASE-26298:
-----------------------------------------
Summary: Downgrading is complicated by refusal to assign system
tables to lower version
Key: HBASE-26298
URL: https://issues.apache.org/jira/browse/HBASE-26298
Project: HBase
Issue Type: Bug
Reporter: Bryan Beaudreault
I was doing some rolling downgrades of test clusters and keep getting into a
state where my automation gets stuck trying to drain the final RegionServer in
the cluster. At this point that RegionServer hosts 3 regions: meta, quota,
namespace. The HMaster is outputting logs like: "Passed destination servername
is null/empty so choosing a server at random".
I's very hard to understand what's happening based on that log, so you really
have to look at the code. Tracking down that log line, it becomes somewhat
clear that you are getting trapped by
AssignmentManager.getExcludedServersForSystemTable().
Looking at the code, you can see comments related to
"hbase.min.version.move.system.tables" config, but the comments are very
unclear. What should I set this to?
This setting was added in https://issues.apache.org/jira/browse/HBASE-22923
which focuses mostly on RSGroup, but this issue is affecting clusters that do
not use RSGroup. The release note also is not super clear.
It would be great to clarify the docs to help the operator know what to change
this to, or perhaps make the config itself more intuitive. For example, could
we just make it an allowlist of versions that can hold system tables? At that
point my path is clear: add the version I'm downgrading to to the allowlist.
This issue is also exacerbated by the fact that by the time you've realized
this you're in a somewhat tricky situation where there's only 1 RegionServer
left and your only way around it is to force stop it or to push a new config
and rolling restart your HMasters. It would be great if this setting were able
to be updated via Admin or at the very least reloadable with
ConfigurationObserver.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)