Viral Bajaria created HBASE-9079:
------------------------------------
Summary: FilterList getNextKeyHint skips rows that should be
included in the results
Key: HBASE-9079
URL: https://issues.apache.org/jira/browse/HBASE-9079
Project: HBase
Issue Type: Bug
Components: Filters
Affects Versions: 0.94.10
Reporter: Viral Bajaria
I hit a weird issue/bug and am able to reproduce the error consistently. The
problem arises when FilterList has two filters where each implements the
getNextKeyHint method.
The way the current implementation works is, StoreScanner will call
matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn
will call filter.getNextKeyHint() which at this stage is of type FilterList.
The implementation in FilterList iterates through all the filters and keeps the
max KeyValue that it sees. All is fine if you wrap filters in FilterList in
which only one of them implements getNextKeyHint. but if multiple of them
implement then that's where things get weird.
For example:
- create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter.
Both of them implement getNextKeyHint
- wrap them in FilterList with MUST_PASS_ALL
- FuzzyRowFilter will seek to the correct first row and then pass it to
ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
- Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow
first which basically says what the next row should be. While in reality we
want the ColumnRangeFilter to give the seek hint.
- The above behavior skips data that should be returned, which I have verified
by using a RowFilter with RegexStringComparator.
I updated the FilterList to maintain state on which filter returns the
SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved
filter and reset that state. I tested it with my current queries and it works
fine but I need to run the entire test suite to make sure I have not introduced
any regression. In addition to that I need to figure out what should be the
behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any
different.
Is my understanding of it being a bug correct ? Or am I trivializing it and
ignoring something very important ? If it's tough to wrap your head around the
explanation, then I can open a JIRA and upload a patch against 0.94 head.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira