Viral Bajaria created HBASE-9079:
------------------------------------

             Summary: FilterList getNextKeyHint skips rows that should be 
included in the results
                 Key: HBASE-9079
                 URL: https://issues.apache.org/jira/browse/HBASE-9079
             Project: HBase
          Issue Type: Bug
          Components: Filters
    Affects Versions: 0.94.10
            Reporter: Viral Bajaria


I hit a weird issue/bug and am able to reproduce the error consistently. The 
problem arises when FilterList has two filters where each implements the 
getNextKeyHint method.

The way the current implementation works is, StoreScanner will call 
matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn 
will call filter.getNextKeyHint() which at this stage is of type FilterList. 
The implementation in FilterList iterates through all the filters and keeps the 
max KeyValue that it sees. All is fine if you wrap filters in FilterList in 
which only one of them implements getNextKeyHint. but if multiple of them 
implement then that's where things get weird.

For example:
- create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. 
Both of them implement getNextKeyHint
- wrap them in FilterList with MUST_PASS_ALL
- FuzzyRowFilter will seek to the correct first row and then pass it to 
ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
- Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow 
first which basically says what the next row should be. While in reality we 
want the ColumnRangeFilter to give the seek hint.
- The above behavior skips data that should be returned, which I have verified 
by using a RowFilter with RegexStringComparator.

I updated the FilterList to maintain state on which filter returns the 
SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved 
filter and reset that state. I tested it with my current queries and it works 
fine but I need to run the entire test suite to make sure I have not introduced 
any regression. In addition to that I need to figure out what should be the 
behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any 
different.

Is my understanding of it being a bug correct ? Or am I trivializing it and 
ignoring something very important ? If it's tough to wrap your head around the 
explanation, then I can open a JIRA and upload a patch against 0.94 head.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to