[
https://issues.apache.org/jira/browse/HBASE-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340245#comment-15340245
]
Mikhail Antonov commented on HBASE-14397:
-----------------------------------------
[~apurtell] re-resolving this issue as it hasn't been reverted or amended. Do
you want to open a follow-up issue?
> PrefixFilter doesn't filter all remaining rows if the prefix is longer than
> rowkey being compared
> -------------------------------------------------------------------------------------------------
>
> Key: HBASE-14397
> URL: https://issues.apache.org/jira/browse/HBASE-14397
> Project: HBase
> Issue Type: Improvement
> Components: Filters
> Affects Versions: 2.0.0
> Reporter: Jianwei Cui
> Assignee: Jianwei Cui
> Priority: Minor
> Attachments: HBASE-14397-trunk-v1.patch
>
>
> The PrefixFilter will filter rowkey as:
> {code}
> public boolean filterRowKey(Cell firstRowCell) {
> ...
> int length = firstRowCell.getRowLength();
> if (length < prefix.length) return true; // ===> return directly if the
> prefix is longer
> ....
> if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0)) {
> passedPrefix = true;
> }
> filterRow = (cmp != 0);
> return filterRow;
> }
> {code}
> If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey
> will filter the rowkey directly without comparing, so that won't set
> 'passedPrefix' flag even the current row is larger than the prefix.
> For example, if there are three rows 'a', 'b' and 'c' in the table, and we
> issue a scan request as:
> {code}
> hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER =>
> "(PrefixFilter ('aa'))"}
> {code}
> The region server will check the three rows before returning. In our
> production, the user issue a scan with a PrefixFilter. The prefix is longer
> than the rowkeys of following millions of rows, so the region server will
> continue to check rows until hit a rowkey longer than the prefix. This make
> the client easily timeout. To fix this case, it seems we need to compare the
> prefix with the rowkey every serveral rows even when the prefix is longer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)