[ 
https://issues.apache.org/jira/browse/HBASE-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481276#comment-13481276
 ] 

Varun Sharma commented on HBASE-5257:
-------------------------------------

Currently, ColumnCountGetFilter and ColumnPaginationFilter suffer from this 
issue - they always undercount when there are multiple versions of a cell (even 
when max versions of a column family is set to 1 - I think this is because the 
versions exist until compaction happens). I looked at the 
ScanQueryMatcher/StoreScanner/ColumnTracker code and it seems that there is one 
other plausible approach towards resolving this. Currently, if a filter wants 
to skip over a KeyValue pair, it has 2 options - skip to next key value pair 
which could be the same column (SKIP) or skip to next column (SEEK_NEXT_COL). 
Though we are providing the filters a mechanism to really skip in these two 
ways when they exclude the value, we don't do that when they "include" the 
value. The INCLUDE always causes a seek to the next key value pair. I think 
that probably makes sense for the ColumnTracker since for column tracking we 
never want to seek across columns after doing an INCLUDE but for filters we 
probably want symmetry when trying to INCLUDE/EXCLUDE key value pairs. So, I 
was proposing something like:

1) Introduce INCLUDE_AND_SEEK_NEXT_COL to Filter.ReturnCode
2) Introduce INCLUDE_AND_SEEK_NEXT_COL to ScanQueryMatcher.MatchCode
3) Modify StoreScanner accordingly to seek to next column after the include and 
also link the above two types in the match() function
4) Finally modify ColumnPaginationFilter to return 
SEEK_NEXT_COL,INCLUDE_AND_SEEK_NEXT_COL instead of 
SKIP,INCLUDE_AND_SEEK_NEXT_COL respectively. Similarly for ColumnCountGetFilter

This might be a more direct way of resolving this issue and would avoid the 
column tracker sandwich between two layers of filters. What do you think, lars ?

Varun
                
> Allow filter to be evaluated after version handling
> ---------------------------------------------------
>
>                 Key: HBASE-5257
>                 URL: https://issues.apache.org/jira/browse/HBASE-5257
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>
> There are various usecases and filter types where evaluating the filter 
> before version are handled either do not make sense, or make filter handling 
> more complicated.
> Also see this comment in ScanQueryMatcher:
> {code}
>     /**
>      * Filters should be checked before checking column trackers. If we do
>      * otherwise, as was previously being done, ColumnTracker may increment 
> its
>      * counter for even that KV which may be discarded later on by Filter. 
> This
>      * would lead to incorrect results in certain cases.
>      */
> {code}
> So we had Filters after the column trackers (which do the version checking), 
> and then moved it.
> Should be at the discretion of the Filter.
> Could either add a new method to FilterBase (maybe excludeVersions() or 
> something). Or have a new Filter wrapper (like WhileMatchFilter), that should 
> only be used as outmost filter and indicates the same (maybe 
> ExcludeVersionsFilter).
> See latest comments on HBASE-5229 for motivation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to