[
https://issues.apache.org/jira/browse/HBASE-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502973#comment-17502973
]
Bryan Beaudreault commented on HBASE-26522:
-------------------------------------------
The most disruptive manifestation of that check (before we disabled it, so we
never tested prod) was https://issues.apache.org/jira/browse/HBASE-26575. I
agree 2.5.0 is an opportunity, but I think I'd have to do some pretty thorough
load testing to determine what a reasonable default for this is. I have it on a
list of things for my team to circle back to post-upgrade, but not sure if I
can fit that in near term. As mentioned in that issue, I think one of the
biggest problems with that feature is:
{quote}At this point I'll say that this in general seems overly aggressive,
especially since the StoreHotnessProtector doesn't actually do any checks for
actual load on the RS. You could have a totally idle RegionServer and submit a
single batch of 100 Puts with 101 columns each – if you don't have at least 5
retries configured, the batch will fail.
{quote}
I'm not entirely sure how to measure load on the RS at that level. Perhaps if
the counters it uses are only trending upwards for a period of time, rather
than going up then down as expected. Meaning writes are backing up. Would need
to look further.
> Improve documentation of hbase 1.x to 2.x potential incompatibilities
> ---------------------------------------------------------------------
>
> Key: HBASE-26522
> URL: https://issues.apache.org/jira/browse/HBASE-26522
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Minor
>
> We're working on a major upgrade of almost 900 tables across 100 production
> clusters (and corresponding QA environment clusters). We've upgraded about
> 25% of our QA environment and run into a series of incompatibilities along
> the way. Most of them have been easy to get around, but I wanted to create
> this Jira to collect them so that we can make an update to the docs for
> future upgraders.
> My plan is to periodically edit this description to add to the list. If
> anyone else has anything to contribute, feel free to edit as well or add a
> comment.
> Incompatibilities to document:
> - HBASE-15676 changed the serialized byte string used for the fuzzy mask.
> FuzzyRowFilters created by older clients will not match any rows in an hbase2
> cluster. This was fixed in HBASE-26537 but should be documented in our
> upgrade guide.
> - CDH5 try/catches bad HTableDescriptor.getDurability calls and returns
> USE_DEFAULT. In hbase2, if someone creates a table with a bad durability
> (i.e. DEFAULT instead of USE_DEFAULT), it results in a failure which causes
> the CreateTableProcedure to infinitely retries with no backoff. This rapid
> retry caused a bunch of pain on the cluster that encountered it, backing up
> datanode's ability to keep up with the millions of calls to create and delete
> .regioninfo files.
> - This isn't quite an incompatibility, but HBASE-19389 introduced a
> concurrency mitigation which may have surprising results coming from older
> versions. The defaults are pretty conservative – when writing more than 100
> columns, no more than 10 concurrent writes or 20 pending writes at once.
> - Increments sent from branch-1 clients may get erroneously stored with a
> timestamp of 0 on hbase2+ clusters: HBASE-26713
> - CheckAndMutate with a "null" compare value used to ignore CompareOp. Fixed
> in HBASE-26742, checkAndMutate affects may change between versions.
> - client will not know how to handle dangling rep_barrier rows in meta:
> HBASE-26797
> - the default hbase split policy is SteppingSplitPolicy. This is overall a
> good policy which is more likely to split small tables to ensure they are
> spread across more servers. If you upgrade, you may notice your tables
> suddenly getting split more than you're used to. This may be an issue if you
> use a row key prefix, because hbase isn't aware of your prefix and may mess
> up your splits. You can get around this by defining a RegionSplitRestriction.
> See HBASE-25766
--
This message was sent by Atlassian Jira
(v8.20.1#820001)