[
https://issues.apache.org/jira/browse/HBASE-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663797#comment-13663797
]
Lars Hofhansl commented on HBASE-8555:
--------------------------------------
Sorry for chiming in late here. This is a problem with RowFilter, right?
There are only three filters that implement both filterRowKey and
filterKeyValue:
# RowFilter: Does not reimplement the check in filterKeyValue
# RandomRowFilter: Has the same problem
# WhileMatchFitler: Implements proper checks in both filterRowKey and
filterKeyValue
So only RowFilter and RandomRowFilter have this problem. Might be better to
just fix these two.
Fix would just be to turn filterOutRow into a Boolean (with capital B) and redo
the test on the row key of the KV passed into filterKeyValue only if
filterOutRow is null and then set it accordingly.
That said, I'm fine with the current fix too if you guys think this is a better
fix. A gain in performance does not trump correctness.
> FilterList correctness was dominated by sub-filter(list) ordering randomly
> --------------------------------------------------------------------------
>
> Key: HBASE-8555
> URL: https://issues.apache.org/jira/browse/HBASE-8555
> Project: HBase
> Issue Type: Bug
> Components: Filters
> Affects Versions: 0.94.3
> Reporter: Liang Xie
> Assignee: Liang Xie
> Priority: Critical
> Attachments: 8555-trunk-v1.txt, HBASE-8555-0.94.txt,
> HBASE-8555-0.94-v2.txt, HBASE-8555-0.94-v3.txt
>
>
> say, ther're 10 rows, column value is i%2:
> row0 0
> row1 1
> row2 0
> row3 1
> row4 0
> row5 1
> row6 0
> row7 1
> row8 0
> row9 1
> 1: filter : row filter > row4 ===> row5 row6 row7 row8 row9
> 2: subFilterList: row filter <= row4 && column==0 ===> row0 row2 row4
> 3.1 filterlist[expected] filter || subFilterList ===> row0 row2 row4 row5
> row6 row7 row8 row9
> 3.2 filterlist[BUGON!] subFilterList || filter ===> row0 row1 row2 row3 row4
> row5 row6 row7 row8 row9
> (Please refer to the new testNestedFilterListWithSCVF case)
> It was found when i managed to transform the following SQL into HBase scan
> statement:
> select xxx from xxx where (pk <= xxx and column1 = xxx) or pk > xxx
> My finding is that we had an assumption for filter methods call sequence:
> e.g. filterRowKey() should be called before filterKeyValue().
> and the orignial filterList.filterRowKey impl broke it due to fast
> short-circuit returning.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira