[ 
https://issues.apache.org/jira/browse/HBASE-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502973#comment-17502973
 ] 

Bryan Beaudreault commented on HBASE-26522:
-------------------------------------------

The most disruptive manifestation of that check (before we disabled it, so we 
never tested prod) was https://issues.apache.org/jira/browse/HBASE-26575. I 
agree 2.5.0 is an opportunity, but I think I'd have to do some pretty thorough 
load testing to determine what a reasonable default for this is. I have it on a 
list of things for my team to circle back to post-upgrade, but not sure if I 
can fit that in near term. As mentioned in that issue, I think one of the 
biggest problems with that feature is:
{quote}At this point I'll say that this in general seems overly aggressive, 
especially since the StoreHotnessProtector doesn't actually do any checks for 
actual load on the RS. You could have a totally idle RegionServer and submit a 
single batch of 100 Puts with 101 columns each – if you don't have at least 5 
retries configured, the batch will fail.
{quote}
I'm not entirely sure how to measure load on the RS at that level. Perhaps if 
the counters it uses are only trending upwards for a period of time, rather 
than going up then down as expected. Meaning writes are backing up. Would need 
to look further.

> Improve documentation of hbase 1.x to 2.x potential incompatibilities
> ---------------------------------------------------------------------
>
>                 Key: HBASE-26522
>                 URL: https://issues.apache.org/jira/browse/HBASE-26522
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Minor
>
> We're working on a major upgrade of almost 900 tables across 100 production 
> clusters (and corresponding QA environment clusters). We've upgraded about 
> 25% of our QA environment and run into a series of incompatibilities along 
> the way. Most of them have been easy to get around, but I wanted to create 
> this Jira to collect them so that we can make an update to the docs for 
> future upgraders.
> My plan is to periodically edit this description to add to the list. If 
> anyone else has anything to contribute, feel free to edit as well or add a 
> comment. 
> Incompatibilities to document:
>  -  HBASE-15676 changed the serialized byte string used for the fuzzy mask. 
> FuzzyRowFilters created by older clients will not match any rows in an hbase2 
> cluster. This was fixed in HBASE-26537 but should be documented in our 
> upgrade guide.
>  - CDH5 try/catches bad HTableDescriptor.getDurability calls and returns 
> USE_DEFAULT. In hbase2, if someone creates a table with a bad durability 
> (i.e. DEFAULT instead of USE_DEFAULT), it results in a failure which causes 
> the CreateTableProcedure to infinitely retries with no backoff. This rapid 
> retry caused a bunch of pain on the cluster that encountered it, backing up 
> datanode's ability to keep up with the millions of calls to create and delete 
> .regioninfo files.
>  - This isn't quite an incompatibility, but HBASE-19389 introduced a 
> concurrency mitigation which may have surprising results coming from older 
> versions. The defaults are pretty conservative – when writing more than 100 
> columns, no more than 10 concurrent writes or 20 pending writes at once.
>  - Increments sent from branch-1 clients may get erroneously stored with a 
> timestamp of 0 on hbase2+ clusters: HBASE-26713
>  - CheckAndMutate with a "null" compare value used to ignore CompareOp. Fixed 
> in HBASE-26742, checkAndMutate affects may change between versions.
>  - client will not know how to handle dangling rep_barrier rows in meta: 
> HBASE-26797
>  - the default hbase split policy is SteppingSplitPolicy. This is overall a 
> good policy which is more likely to split small tables to ensure they are 
> spread across more servers. If you upgrade, you may notice your tables 
> suddenly getting split more than you're used to. This may be an issue if you 
> use a row key prefix, because hbase isn't aware of your prefix and may mess 
> up your splits. You can get around this by defining a RegionSplitRestriction. 
> See HBASE-25766



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to