[
https://issues.apache.org/jira/browse/HBASE-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399058#comment-15399058
]
ramkrishna.s.vasudevan commented on HBASE-16296:
------------------------------------------------
I think I figured out the problem. Yes this happens only when scan.setCaching()
is equal to the pagefilter count. And when the scan is done with a filter list
and that filter list has the PageFilter.
The main issue is due to the filterList impl of filterRowKey().
{code}
if (this.operator == Operator.MUST_PASS_ALL) {
if (filter.filterAllRemaining() ||
filter.filterRowKey(rowKey, offset, length)) {
flag = true;
}
}
{code}
Here for pageFilter once the required rows are fetched and the pageSize is
reached filterallRemaining is always true.
When the HRegion does the scan
{code}
if (filterRowKey(current)) {
incrementCountOfRowsFilteredMetric(scannerContext);
// Typically the count of rows scanned is incremented inside
#populateResult. However,
// here we are filtering a row based purely on its row key,
preventing us from calling
// #populateResult. Thus, perform the necessary increment here to
rows scanned metric
incrementCountOfRowsScannedMetric(scannerContext);
boolean moreRows = nextRow(scannerContext, current);
if (!moreRows) {
return
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
}
results.clear();
continue;
}
{code}
So after the results are fetched based on the caching count (page count is also
same). The scan.next() enters into the above code and it sees filterRowKey()
returns true (with filter list whereas it says false with no filter list). So
the code inside the above 'if' block is very expensive considering the fact
that it gets the nextRow and again tries to apply the filter logic of
filterKey. In case of reverse scan this is much more expensive because it does
a seek to the previous row which is more costlier considering the fact that the
reverse scan will fetch almost all the rows and then say there is nothing more
to read. (when there is no stopRow set in scan then it is full table scan).
For the case where pageFilter is not added as a list since the filterRowKey()
returns false we try to fetch the nextRow using the ScanQueryMatch layer and
that layer will not fetch any results as again filterAllRemaining() is true. So
since at the HRegion level there are no results we return back saying no
result. In this case, yes it does extra scan but only till the nextRow but
with a filer list it is scan till the end and so perf is degraded.
> Reverse scan performance degrades when scanner cache size matches page filter
> size
> ----------------------------------------------------------------------------------
>
> Key: HBASE-16296
> URL: https://issues.apache.org/jira/browse/HBASE-16296
> Project: HBase
> Issue Type: Bug
> Reporter: James Taylor
> Attachments: generatedata-snippet.java, repro-snippet.java
>
>
> When a reverse scan is done, the server seems to not know it's done when the
> scanner cache size matches the number of rows in a PageFilter. See
> PHOENIX-3121 for how this manifests itself. We have a standalone, pure HBase
> API reproducer too that I'll attach (courtesy of [~churromorales] and
> [~mujtabachohan]).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)