[
https://issues.apache.org/jira/browse/HBASE-26575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459454#comment-17459454
]
Bryan Beaudreault commented on HBASE-26575:
-------------------------------------------
Here's a test method which can reproduce this
{code:java}
@Test public void testBigBatchPut() throws IOException, InterruptedException {
TableName tableName = TableName.valueOf(name.getMethodName());
TEST_UTIL.createTable(tableName, HBaseTestingUtility.fam1).close();
byte[][] columns = new byte[101][];
for (int i = 0; i < columns.length; i++) {
columns[i] = Bytes.toBytes(i);
}
byte[][] rows = new byte[101][];
for (int i = 0; i < rows.length; i++) {
rows[i] = Bytes.toBytes(i);
}
Configuration c = new Configuration(TEST_UTIL.getConfiguration());
c.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 3);
// there are 101 rows, so expecting initial attempt + 3 retries = 4 attempts
// for some reason im actually seeing 1 more retry than configured, so it's
actually 5 attempts
// 5 attempts * 20 rows = 100 rows
// so the below will fail with an exception:
// "Failed 1 action: org.apache.hadoop.hbase.RegionTooBusyException"
try (Connection connection = ConnectionFactory.createConnection(c)) {
try (Table t = connection.getTable(tableName)) {
if (t instanceof HTable) {
HTable table = (HTable) t;
table.setOperationTimeout(3 * 1000);
List<Put> puts = new ArrayList<>(rows.length);
for (byte[] row : rows) {
Put put = new Put(row);
for (byte[] column : columns) {
put.addColumn(HBaseTestingUtility.fam1, column, column);
}
puts.add(put);
}
table.batch(puts, null);
}
}
}
}{code}
> StoreHotnessProtector may block Replication
> -------------------------------------------
>
> Key: HBASE-26575
> URL: https://issues.apache.org/jira/browse/HBASE-26575
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
>
> I'm upgrading from hbase1 to hbase2, and I'm still in my QA environment where
> load is very low. Even still, I've noticed some bad interaction between
> Replication and the StoreHotnessProtector.
> The ReplicationSink collects edits from the WAL and executes them in batches
> via the normal HTable interface. Despite the name of this property, the max
> batch sizes are based on "hbase.rpc.rows.warning.threshold" which has a
> default of 5000.
> The StoreHotnessProtector defaults to allowing 10 concurrent writes (of 100
> columns or more) to a Store, or 20 concurrent "prepares" of said writes. The
> Prepare part is what causes issues here. When a batch mutate comes in, the RS
> first takes a lock on all rows in the batch. This happens in
> HRegion#lockRowsAndBuildMiniBatch, and the writes are recorded as "preparing"
> in StoreHotnessProtector before acquiring the lock. This recording basically
> increments a counter, and throws an exception if that counter goes over 20.
> Back in HRegion#lockRowsAndBuildMiniBatch, the exception is caught and
> recorded in the results for any items that failed. Any items that succeed
> continue on to write, unless the write is atomic, in which case it
> immediately throws an exception.
> This response gets back to the client, which automatically handles retries.
> With enough retries, the batch call will eventually succeed because each
> retry contains fewer and fewer writes to handle. Assuming you have enough
> retries, this is basically enforcing an automatic chunking of of a batch
> write into sub-batches of 20. Again, this only affects writes that hit more
> than 100 columns (by default).
> At this point I'll say that this in general seems overly aggressive,
> especially since the StoreHotnessProtector doesn't actually do any checks for
> actual load on the RS. You could have a totally idle RegionServer and submit
> a single batch of 100 Puts with 101 columns each – if you don't have at least
> 5 retries configured, the batch will fail.
> Back to ReplicationSink, the default batch size is 5000 Puts and the default
> retries is 4. For a table with wide rows (which might cause replication to
> try to sink Puts with more than 100 columns), it becomes basically impossible
> to replicate because the number of retries is not nearly enough to move
> through a batch of up to 5000, 20 at a time.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)