[jira] [Created] (HBASE-26575) StoreHotnessProtector may block Replication

Bryan Beaudreault (Jira) Tue, 14 Dec 2021 11:33:06 -0800

Bryan Beaudreault created HBASE-26575:
-----------------------------------------


             Summary: StoreHotnessProtector may block Replication
                 Key: HBASE-26575
                 URL: https://issues.apache.org/jira/browse/HBASE-26575
             Project: HBase
          Issue Type: Bug
            Reporter: Bryan Beaudreault


I'm upgrading from hbase1 to hbase2, and I'm still in my QA environment where 
load is very low. Even still, I've noticed some bad interaction between 
Replication and the StoreHotnessProtector.

The ReplicationSink collects edits from the WAL and executes them in batches 
via the normal HTable interface. Despite the name of this property, the max 
batch sizes are based on "hbase.rpc.rows.warning.threshold" which has a default 
of 5000. 

The StoreHotnessProtector defaults to allowing 10 concurrent writes (of 100 
columns or more) to a Store, or 20 concurrent "prepares" of said writes. The 
Prepare part is what causes issues here. When a batch mutate comes in, the RS 
first takes a lock on all rows in the batch. This happens in 
HRegion#lockRowsAndBuildMiniBatch, and the writes are recorded as "preparing" 
in StoreHotnessProtector before acquiring the lock. This recording basically 
increments a counter, and throws an exception if that counter goes over 20.

Back in HRegion#lockRowsAndBuildMiniBatch, the exception is caught and recorded 
in the results for any items that failed. Any items that succeed continue on to 
write,  unless the write is atomic, in which case it immediately throws an 
exception.

This response gets back to the client, which automatically handles retries. 
With enough retries, the batch call will eventually succeed because each retry 
contains fewer and fewer writes to handle. Assuming you have enough retries, 
this is basically enforcing an automatic chunking of of a batch write into 
sub-batches of 20. Again, this only affects writes that hit more than 100 
columns (by default).

At this point I'll say that this in general seems overly aggressive, especially 
since the StoreHotnessProtector doesn't actually do any checks for actual load 
on the RS. You could have a totally idle RegionServer and submit a single batch 
of 100 Puts with 101 columns each – if you don't have at least 5 retries 
configured, the batch will fail.

Back to ReplicationSink, the default batch size is 5000 Puts and the default 
retries is 4. For a table with wide rows (which might cause replication to try 
to sink Puts with more than 100 columns), it becomes basically impossible to 
replicate because the number of retries is not nearly enough to move through a 
batch of up to 5000, 20 at a time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HBASE-26575) StoreHotnessProtector may block Replication

Reply via email to