[
https://issues.apache.org/jira/browse/HBASE-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461982#comment-17461982
]
Bryan Beaudreault commented on HBASE-26298:
-------------------------------------------
Thanks for the thoughts [~vjasani] . Another thought I just had – I wonder if
we could make this easier on operators by at the very least disabling for minor
or patch version changes. Due to our compatibility requirements, I don't think
we'd introduce a change in a patch release (i.e. 2.4.6 -> 2.4.8). I think it's
probably even safe to assume the same for minor (2.4.6 -> 2.5.0). I think we
could probably handle this in the code, but what do you think about the idea?
> Downgrading is complicated by refusal to assign system tables to lower version
> ------------------------------------------------------------------------------
>
> Key: HBASE-26298
> URL: https://issues.apache.org/jira/browse/HBASE-26298
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Minor
>
> I was doing some rolling downgrades of test clusters and keep getting into a
> state where my automation gets stuck trying to drain the final RegionServer
> in the cluster. At this point that RegionServer hosts 3 regions: meta, quota,
> namespace. The HMaster is outputting logs like: "Passed destination
> servername is null/empty so choosing a server at random".
> I's very hard to understand what's happening based on that log, so you really
> have to look at the code. Tracking down that log line, it becomes somewhat
> clear that you are getting trapped by
> AssignmentManager.getExcludedServersForSystemTable().
> Looking at the code, you can see comments related to
> "hbase.min.version.move.system.tables" config, but the comments are very
> unclear. What should I set this to?
> This setting was added in https://issues.apache.org/jira/browse/HBASE-22923
> which focuses mostly on RSGroup, but this issue is affecting clusters that do
> not use RSGroup. The release note also is not super clear.
> It would be great to clarify the docs to help the operator know what to
> change this to, or perhaps make the config itself more intuitive. For
> example, could we just make it an allowlist of versions that can hold system
> tables? At that point my path is clear: add the version I'm downgrading to to
> the allowlist.
> This issue is also exacerbated by the fact that by the time you've realized
> this you're in a somewhat tricky situation where there's only 1 RegionServer
> left and your only way around it is to force stop it or to push a new config
> and rolling restart your HMasters. It would be great if this setting were
> able to be updated via Admin or at the very least reloadable with
> ConfigurationObserver.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)