[
https://issues.apache.org/jira/browse/HBASE-26575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459465#comment-17459465
]
Bryan Beaudreault commented on HBASE-26575:
-------------------------------------------
For now we've disabled the StoreHotnessProtector in our environment by setting
hbase.region.store.parallel.put.limit to 0. We may revisit this in the future
since it seems like the overall intent of that feature is good, but more
thought is necessary on what a safe config looks like. We're also concerned
about how this would affect our many thousands of clients.
In the meantime, definitely open to other volunteers or opinions on this issue.
It seems like this particular Jira could simply result in a change to default
values or error handling such that ReplicationSink and StoreHotnessProtector
can coexist without replication failures.
> StoreHotnessProtector may block Replication
> -------------------------------------------
>
> Key: HBASE-26575
> URL: https://issues.apache.org/jira/browse/HBASE-26575
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
>
> I'm upgrading from hbase1 to hbase2, and I'm still in my QA environment where
> load is very low. Even still, I've noticed some bad interaction between
> Replication and the StoreHotnessProtector.
> The ReplicationSink collects edits from the WAL and executes them in batches
> via the normal HTable interface. Despite the name of this property, the max
> batch sizes are based on "hbase.rpc.rows.warning.threshold" which has a
> default of 5000.
> The StoreHotnessProtector defaults to allowing 10 concurrent writes (of 100
> columns or more) to a Store, or 20 concurrent "prepares" of said writes. The
> Prepare part is what causes issues here. When a batch mutate comes in, the RS
> first takes a lock on all rows in the batch. This happens in
> HRegion#lockRowsAndBuildMiniBatch, and the writes are recorded as "preparing"
> in StoreHotnessProtector before acquiring the lock. This recording basically
> increments a counter, and throws an exception if that counter goes over 20.
> Back in HRegion#lockRowsAndBuildMiniBatch, the exception is caught and
> recorded in the results for any items that failed. Any items that succeed
> continue on to write, unless the write is atomic, in which case it
> immediately throws an exception.
> This response gets back to the client, which automatically handles retries.
> With enough retries, the batch call will eventually succeed because each
> retry contains fewer and fewer writes to handle. Assuming you have enough
> retries, this is basically enforcing an automatic chunking of of a batch
> write into sub-batches of 20. Again, this only affects writes that hit more
> than 100 columns (by default).
> At this point I'll say that this in general seems overly aggressive,
> especially since the StoreHotnessProtector doesn't actually do any checks for
> actual load on the RS. You could have a totally idle RegionServer and submit
> a single batch of 100 Puts with 101 columns each – if you don't have at least
> 5 retries configured, the batch will fail.
> Back to ReplicationSink, the default batch size is 5000 Puts and the default
> retries is 4. For a table with wide rows (which might cause replication to
> try to sink Puts with more than 100 columns), it becomes basically impossible
> to replicate because the number of retries is not nearly enough to move
> through a batch of up to 5000, 20 at a time.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)