[GitHub] [hbase] bbeaudreault commented on a diff in pull request #4967: HBASE-27558 Scan quotas and limits should account for total block IO

GitBox Fri, 20 Jan 2023 08:41:03 -0800


bbeaudreault commented on code in PR #4967:
URL: https://github.com/apache/hbase/pull/4967#discussion_r1082800609



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java:
##########
@@ -608,6 +610,11 @@ private boolean nextInternal(List<Cell> results, 
ScannerContext scannerContext)
           return 
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
         }
         if (!shouldStop) {
+          // Read nothing as the cells were filtered, but still need to check 
time limit.

Review Comment:
   Actually, this section of the method is not nearly as hot as the rest of the 
method. The only real way we reach this point is when 
`filter.filterRowCells(kvs)` clears all cells from the results after having 
been accumulated in StoreScanner.  There are only 2 standard filters which do 
this -- DependentColumnFilter and SingleColumnValueExcludeFilter.
   
   That said, you do make a good point.  We have never had a time limit here, 
so we may not need it. We _do_ need a size limit check here, now that we track 
block sizes. Previously, we would not check size limit here because the results 
are empty so wouldn't have accumulated size progress. Now that we accumulate 
block size progress even for filtered rows, we need a check. 
   
   For this type of scan, we will have accumulated blocks in both 
`populateResults()` and possibly `nextRow()`. Right after `populateResults()` 
there's a `scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)` call. 
That call doesn't protect against this case, because it passes `BETWEEN_CELLS`. 
 For scans with `filter.hasFilterRow()`, the limit scope is changed to 
`LimitScope.BETWEEN_ROWS`. So this check is skipped for these.  Scans which 
enter this code block will have skipped all other limit checks above. The 
checkSizeLimit I add here is the only safe place we can check BETWEEN_ROWS for 
these types of filtered scans.
   
   The best way to illustrate this is with a test -- I just pushed a change 
which does the following:
   
   1. Change this line to just `checkSizeLimit`
   2. Adds a new test `testCheckLimitAfterFilteringRowCells`
   
   If I comment out this checkSizeLimit, the added test fails -- the whole scan 
is able to complete in 1 rpc instead of the expected 4. So this illustrates 
that we need to have a size check here.
   
   Personally I think it's also accurate to have a time limit check here, 
because for these types of scans I think they'd be able to circumvent our 
existing time limits. But within the scope of this JIRA, I can keep it to just 
size limit for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] bbeaudreault commented on a diff in pull request #4967: HBASE-27558 Scan quotas and limits should account for total block IO

Reply via email to