[jira] [Commented] (HBASE-16296) Reverse scan performance degrades when scanner cache size matches page filter size

ramkrishna.s.vasudevan (JIRA) Fri, 29 Jul 2016 02:53:29 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399058#comment-15399058
 ]


ramkrishna.s.vasudevan commented on HBASE-16296:
------------------------------------------------

I think I figured out the problem. Yes this happens only when scan.setCaching() 
is equal to the pagefilter count. And when the scan is done with a filter list 
and that filter list has the PageFilter.

The main issue is due to the filterList impl of filterRowKey(). 
{code}
      if (this.operator == Operator.MUST_PASS_ALL) {
        if (filter.filterAllRemaining() ||
            filter.filterRowKey(rowKey, offset, length)) {
          flag =  true;
        }
      }
{code}
Here for pageFilter once the required rows are fetched and the pageSize is 
reached filterallRemaining is always true.

When the HRegion does the scan
{code}
if (filterRowKey(current)) {
            incrementCountOfRowsFilteredMetric(scannerContext);
            // Typically the count of rows scanned is incremented inside 
#populateResult. However,
            // here we are filtering a row based purely on its row key, 
preventing us from calling
            // #populateResult. Thus, perform the necessary increment here to 
rows scanned metric
            incrementCountOfRowsScannedMetric(scannerContext);
            boolean moreRows = nextRow(scannerContext, current);
            if (!moreRows) {
              return 
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
            }
            results.clear();
            continue;
          }
{code}
So after the results are fetched based on the caching count (page count is also 
same). The scan.next() enters into the above code and it sees filterRowKey() 
returns true (with filter list whereas it says false with no filter list). So 
the code inside the above 'if' block is very expensive considering the fact 
that it gets the nextRow and again tries to apply the filter logic of 
filterKey. In case of reverse scan this is much more expensive because it does 
a seek to the previous row which is more costlier considering the fact that the 
reverse scan will fetch almost all the rows and then say there is nothing more 
to read. (when there is no stopRow set in scan then it is full table scan). 
For the case where pageFilter is not added as a list since the filterRowKey() 
returns false we try to fetch the nextRow using the ScanQueryMatch layer and 
that layer will not fetch any results as again filterAllRemaining() is true. So 
since at the HRegion level there are no results we return back saying no 
result.  In this case, yes it does extra scan but only till the nextRow but 
with a filer list it is scan till the end and so perf is degraded. 

> Reverse scan performance degrades when scanner cache size matches page filter 
> size
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-16296
>                 URL: https://issues.apache.org/jira/browse/HBASE-16296
>             Project: HBase
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: generatedata-snippet.java, repro-snippet.java
>
>
> When a reverse scan is done, the server seems to not know it's done when the 
> scanner cache size matches the number of rows in a PageFilter. See 
> PHOENIX-3121 for how this manifests itself. We have a standalone, pure HBase 
> API reproducer too that I'll attach (courtesy of [~churromorales] and 
> [~mujtabachohan]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16296) Reverse scan performance degrades when scanner cache size matches page filter size

Reply via email to