[
https://issues.apache.org/jira/browse/HBASE-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461949#comment-17461949
]
Viraj Jasani commented on HBASE-26298:
--------------------------------------
{quote} * Make AssignmentManager implement ConfigurationObserver, so that we
can live update "hbase.min.version.move.system.tables"{quote}
Sounds good, we can make this change. Absolutely nothing wrong with dev or
operator updating this config during ongoing upgrade and rollback the value to
default state, as required.
{quote} * Improve docs a bit (i will take a stab, and see if you agree with the
new description){quote}
Sure thing, would be great.
{quote}The other thing I was wondering about is whether we could set a better
default value for this. I am guessing the devs are the most knowledgeable about
what incompatibilities exist that would warrant not moving system tables, right?
{quote}
Valid point, devs have better idea about this sort of incompatibilities.
However, in order to set default value between given two releases, we would
require thorough testing, and performing such testing among all releases might
be exhaustive. Let's see if, maybe, we could add integration tests to cover
this? Or how about setting default based on whether release is major vs
minor/patch. For major release, we can assume that there might be some
architectural change in the meta vs user requests processing such that if meta
is served from higher versioned servers – before any user regions are moved to
higher versioned servers – that would handle the ongoing traffic without any
issues, whereas the same might not always be applicable to minor or maintenance
releases.
{quote}I'm going to get to work on the 2 bullets
{quote}
Thank you [~bbeaudreault] !
> Downgrading is complicated by refusal to assign system tables to lower version
> ------------------------------------------------------------------------------
>
> Key: HBASE-26298
> URL: https://issues.apache.org/jira/browse/HBASE-26298
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Minor
>
> I was doing some rolling downgrades of test clusters and keep getting into a
> state where my automation gets stuck trying to drain the final RegionServer
> in the cluster. At this point that RegionServer hosts 3 regions: meta, quota,
> namespace. The HMaster is outputting logs like: "Passed destination
> servername is null/empty so choosing a server at random".
> I's very hard to understand what's happening based on that log, so you really
> have to look at the code. Tracking down that log line, it becomes somewhat
> clear that you are getting trapped by
> AssignmentManager.getExcludedServersForSystemTable().
> Looking at the code, you can see comments related to
> "hbase.min.version.move.system.tables" config, but the comments are very
> unclear. What should I set this to?
> This setting was added in https://issues.apache.org/jira/browse/HBASE-22923
> which focuses mostly on RSGroup, but this issue is affecting clusters that do
> not use RSGroup. The release note also is not super clear.
> It would be great to clarify the docs to help the operator know what to
> change this to, or perhaps make the config itself more intuitive. For
> example, could we just make it an allowlist of versions that can hold system
> tables? At that point my path is clear: add the version I'm downgrading to to
> the allowlist.
> This issue is also exacerbated by the fact that by the time you've realized
> this you're in a somewhat tricky situation where there's only 1 RegionServer
> left and your only way around it is to force stop it or to push a new config
> and rolling restart your HMasters. It would be great if this setting were
> able to be updated via Admin or at the very least reloadable with
> ConfigurationObserver.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)