bbeaudreault commented on code in PR #4967:
URL: https://github.com/apache/hbase/pull/4967#discussion_r1082800609
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java:
##########
@@ -608,6 +610,11 @@ private boolean nextInternal(List<Cell> results,
ScannerContext scannerContext)
return
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
}
if (!shouldStop) {
+ // Read nothing as the cells were filtered, but still need to check
time limit.
Review Comment:
Actually, this section of the method is not nearly as hot as the rest of the
method. The only real way we reach this point is when
`filter.filterRowCells(kvs)` clears all cells from the results after having
been accumulated in StoreScanner. There are only 2 standard filters which do
this -- DependentColumnFilter and SingleColumnValueExcludeFilter.
That said, you do make a good point. We have never had a time limit here,
so we may not need it. We _do_ need a size limit check here, now that we track
block sizes. Previously, we would not check size limit here because the results
are empty so wouldn't have accumulated size progress. Now that we accumulate
block size progress even for filtered rows, we need a check.
For this type of scan, we will have accumulated blocks in both
`populateResults()` and possibly `nextRow()`. Right after `populateResults()`
there's a `scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)` call.
That call doesn't protect against this case, because it passes `BETWEEN_CELLS`.
For scans with `filter.hasFilterRow()`, the limit scope is changed to
`LimitScope.BETWEEN_ROWS`. So this check is skipped for these. Scans which
enter this code block will have skipped all other limit checks above. The
checkSizeLimit I add here is the only safe place we can check BETWEEN_ROWS for
these types of filtered scans.
The best way to illustrate this is with a test -- I just pushed a change
which does the following:
1. Change this line to just `checkSizeLimit`
2. Adds a new test `testCheckLimitAfterFilteringRowCells`
If I comment out this checkSizeLimit, the added test fails -- the whole scan
is able to complete in 1 rpc instead of the expected 4. So this illustrates
that we need to have a size check here.
Personally I think it's also accurate to have a time limit check here,
because for these types of scans I think they'd be able to circumvent our
existing time limits. But within the scope of this JIRA, I can keep it to just
size limit for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]