[
https://issues.apache.org/jira/browse/HBASE-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729738#comment-13729738
]
Viral Bajaria commented on HBASE-9079:
--------------------------------------
[~lhofhansl] Can you review the patch when you get a chance ? I have already
deployed this to my production cluster and have not had any issues.
> FilterList getNextKeyHint skips rows that should be included in the results
> ---------------------------------------------------------------------------
>
> Key: HBASE-9079
> URL: https://issues.apache.org/jira/browse/HBASE-9079
> Project: HBase
> Issue Type: Bug
> Components: Filters
> Affects Versions: 0.94.10
> Reporter: Viral Bajaria
> Attachments: HBASE-9079-0.94.patch, HBASE-9079-trunk.patch
>
>
> I hit a weird issue/bug and am able to reproduce the error consistently. The
> problem arises when FilterList has two filters where each implements the
> getNextKeyHint method.
> The way the current implementation works is, StoreScanner will call
> matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in
> turn will call filter.getNextKeyHint() which at this stage is of type
> FilterList. The implementation in FilterList iterates through all the filters
> and keeps the max KeyValue that it sees. All is fine if you wrap filters in
> FilterList in which only one of them implements getNextKeyHint. but if
> multiple of them implement then that's where things get weird.
> For example:
> - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter.
> Both of them implement getNextKeyHint
> - wrap them in FilterList with MUST_PASS_ALL
> - FuzzyRowFilter will seek to the correct first row and then pass it to
> ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
> - Now in FilterList when getNextKeyHint is called, it calls the one on
> FuzzyRow first which basically says what the next row should be. While in
> reality we want the ColumnRangeFilter to give the seek hint.
> - The above behavior skips data that should be returned, which I have
> verified by using a RowFilter with RegexStringComparator.
> I updated the FilterList to maintain state on which filter returns the
> SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved
> filter and reset that state. I tested it with my current queries and it works
> fine but I need to run the entire test suite to make sure I have not
> introduced any regression. In addition to that I need to figure out what
> should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it
> should be any different.
> Is my understanding of it being a bug correct ? Or am I trivializing it and
> ignoring something very important ? If it's tough to wrap your head around
> the explanation, then I can open a JIRA and upload a patch against 0.94 head.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira