Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

via GitHub Sun, 10 Mar 2024 10:18:55 -0700


bbeaudreault commented on code in PR #5713:
URL: https://github.com/apache/hbase/pull/5713#discussion_r1518902313



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/DefaultOperationQuota.java:
##########
@@ -158,24 +193,60 @@ public void addMutation(final Mutation mutation) {
    * Update estimate quota(read/write size/capacityUnits) which will be 
consumed
    * @param numWrites the number of write requests
    * @param numReads  the number of read requests
-   * @param numScans  the number of scan requests
    */
-  protected void updateEstimateConsumeQuota(int numWrites, int numReads, int 
numScans) {
+  protected void updateEstimateConsumeBatchQuota(int numWrites, int numReads) {
     writeConsumed = estimateConsume(OperationType.MUTATE, numWrites, 100);
 
     if (useResultSizeBytes) {
       readConsumed = estimateConsume(OperationType.GET, numReads, 100);
-      readConsumed += estimateConsume(OperationType.SCAN, numScans, 1000);
     } else {
       // assume 1 block required for reads. this is probably a low estimate, 
which is okay
       readConsumed = numReads > 0 ? blockSizeBytes : 0;
-      readConsumed += numScans > 0 ? blockSizeBytes : 0;
     }
 
     writeCapacityUnitConsumed = calculateWriteCapacityUnit(writeConsumed);
     readCapacityUnitConsumed = calculateReadCapacityUnit(readConsumed);
   }
 
+  /**
+   * Update estimate quota(read/write size/capacityUnits) which will be 
consumed
+   * @param scanRequest          the scan to be executed
+   * @param maxScannerResultSize the maximum bytes to be returned by the 
scanner
+   * @param maxBlockBytesScanned the maximum bytes scanned in a single RPC 
call by the scanner
+   */
+  protected void updateEstimateConsumeScanQuota(ClientProtos.ScanRequest 
scanRequest,
+    long maxScannerResultSize, long maxBlockBytesScanned) {
+    if (useResultSizeBytes) {
+      readConsumed = estimateConsume(OperationType.GET, 1, 1000);
+    } else {
+      /*
+       * Estimating scan workload is more complicated, and if we severely 
underestimate workloads
+       * then throttled clients will exhaust retries too quickly, and could 
saturate the RPC layer.
+       * We have access to the ScanRequest's nextCallSeq number, the 
maxScannerResultSize, and the
+       * maxBlockBytesScanned by every relevant Scanner#next call. With these 
inputs we can make a
+       * more informed estimate about the scan's workload.
+       */
+      long estimate;
+      if (scanRequest.getNextCallSeq() == 0) {
+        // start scanners with an optimistic 1 block IO estimate
+        // it is better to underestimate a large scan in the beginning
+        // than to overestimate, and block, a small scan
+        estimate = blockSizeBytes;
+      } else {
+        // scanner result sizes will be limited by quota availability, 
regardless of
+        // maxScannerResultSize. This means that we cannot safely assume that 
a long-running
+        // scan with a small maxBlockBytesScanned would not prefer to pull down
+        // a larger payload. So we should estimate with the assumption that 
long-running scans
+        // are appropriately configured to approach their maxScannerResultSize 
per RPC call
+        estimate =

Review Comment:
   So every time a next() comes in, we double the previous BBS as the new 
limit. So let's say someone is doing a long scan with a caching of 1. So each 
time it's doing only 1 block of IO, but we are still doubling the new estimate. 
So it becomes harder to get a quota if we are always estimating 2x more than 
really fetching.  The doubling is good for reaching a plateau quickly, but then 
it should stop.
   
   For the 1 block/1 caching case, it might only be 64k vs 128kb estimate. Not 
a huge deal. But lets say we have a can that's doing 25mb per next(). To each 
time set the estimate to 50mb, that's quite a difference. Ideally once the scan 
itself plateaus in BBS, that should be the limit going forward. So let's say 
for the 25mb example, it does 64, 129, 256, etc up to 25. The next request the 
estimate is 50, but the resulting bbs is only 25. Ideally after that we just 
use 25 as the estimate going forward



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28385 make Scan estimates more realistic [hbase]

Reply via email to