[ 
https://issues.apache.org/jira/browse/HBASE-26575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459454#comment-17459454
 ] 

Bryan Beaudreault commented on HBASE-26575:
-------------------------------------------

Here's a test method which can reproduce this
{code:java}
@Test public void testBigBatchPut() throws IOException, InterruptedException {
  TableName tableName = TableName.valueOf(name.getMethodName());
  TEST_UTIL.createTable(tableName, HBaseTestingUtility.fam1).close();

  byte[][] columns = new byte[101][];
  for (int i = 0; i < columns.length; i++) {
    columns[i] = Bytes.toBytes(i);
  }

  byte[][] rows = new byte[101][];
  for (int i = 0; i < rows.length; i++) {
    rows[i] = Bytes.toBytes(i);
  }

  Configuration c = new Configuration(TEST_UTIL.getConfiguration());
  c.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 3);

  // there are 101 rows, so expecting initial attempt + 3 retries = 4 attempts
  // for some reason im actually seeing 1 more retry than configured, so it's 
actually 5 attempts
  // 5 attempts * 20 rows = 100 rows
  // so the below will fail with an exception:
  // "Failed 1 action: org.apache.hadoop.hbase.RegionTooBusyException" 

  try (Connection connection = ConnectionFactory.createConnection(c)) {
    try (Table t = connection.getTable(tableName)) {
      if (t instanceof HTable) {
        HTable table = (HTable) t;
        table.setOperationTimeout(3 * 1000);

        List<Put> puts = new ArrayList<>(rows.length);
        for (byte[] row : rows) {
          Put put = new Put(row);
          for (byte[] column : columns) {
            put.addColumn(HBaseTestingUtility.fam1, column, column);
          }
          puts.add(put);
        }

        table.batch(puts, null);
      }
    }
  }
}{code}

> StoreHotnessProtector may block Replication
> -------------------------------------------
>
>                 Key: HBASE-26575
>                 URL: https://issues.apache.org/jira/browse/HBASE-26575
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> I'm upgrading from hbase1 to hbase2, and I'm still in my QA environment where 
> load is very low. Even still, I've noticed some bad interaction between 
> Replication and the StoreHotnessProtector.
> The ReplicationSink collects edits from the WAL and executes them in batches 
> via the normal HTable interface. Despite the name of this property, the max 
> batch sizes are based on "hbase.rpc.rows.warning.threshold" which has a 
> default of 5000. 
> The StoreHotnessProtector defaults to allowing 10 concurrent writes (of 100 
> columns or more) to a Store, or 20 concurrent "prepares" of said writes. The 
> Prepare part is what causes issues here. When a batch mutate comes in, the RS 
> first takes a lock on all rows in the batch. This happens in 
> HRegion#lockRowsAndBuildMiniBatch, and the writes are recorded as "preparing" 
> in StoreHotnessProtector before acquiring the lock. This recording basically 
> increments a counter, and throws an exception if that counter goes over 20.
> Back in HRegion#lockRowsAndBuildMiniBatch, the exception is caught and 
> recorded in the results for any items that failed. Any items that succeed 
> continue on to write,  unless the write is atomic, in which case it 
> immediately throws an exception.
> This response gets back to the client, which automatically handles retries. 
> With enough retries, the batch call will eventually succeed because each 
> retry contains fewer and fewer writes to handle. Assuming you have enough 
> retries, this is basically enforcing an automatic chunking of of a batch 
> write into sub-batches of 20. Again, this only affects writes that hit more 
> than 100 columns (by default).
> At this point I'll say that this in general seems overly aggressive, 
> especially since the StoreHotnessProtector doesn't actually do any checks for 
> actual load on the RS. You could have a totally idle RegionServer and submit 
> a single batch of 100 Puts with 101 columns each – if you don't have at least 
> 5 retries configured, the batch will fail.
> Back to ReplicationSink, the default batch size is 5000 Puts and the default 
> retries is 4. For a table with wide rows (which might cause replication to 
> try to sink Puts with more than 100 columns), it becomes basically impossible 
> to replicate because the number of retries is not nearly enough to move 
> through a batch of up to 5000, 20 at a time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to