[ 
https://issues.apache.org/jira/browse/HBASE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Lam updated HBASE-6757:
-----------------------------

    Attachment: DisplayFilter.java
                CopyOfTestColumnPrefixFilter.java

The TestColumnPrefixFilter demonstrates the inefficiency of the scan by using 
DisplayFilter which logs all the calls to the filter's methods. 


testEfficiencyWithoutFliterList only scans 3 keyvalues and return. Whereas, 
testEfficiencyWithFliterList scans 10002 keyvalues. The only difference between 
the two tests is that testEfficiencyWithFliterList uses FilterList to wrap the 
ColumnPrefixFilter and the filterlist is passed to the scan instead of the 
ColumnPrefixFilter.

For this to work, DisplayFilter needs to be deployed to hbase first. The log is 
written to the HMaster log.
                
> Very inefficient behaviour of scan using FilterList
> ---------------------------------------------------
>
>                 Key: HBASE-6757
>                 URL: https://issues.apache.org/jira/browse/HBASE-6757
>             Project: HBase
>          Issue Type: Improvement
>          Components: filters
>    Affects Versions: 0.90.6
>            Reporter: Jerry Lam
>         Attachments: CopyOfTestColumnPrefixFilter.java, DisplayFilter.java
>
>
> The behaviour of scan is very inefficient when using with FilterList.
> The FilterList rewrites the return code from NEXT_ROW to SKIP from a filter 
> if Operator.MUST_PASS_ALL is used. 
> This happens when using ColumnPrefixFilter. Even though the 
> ColumnPrefixFilter indicates to jump to NEXT_ROW because no further match can 
> be found, the scan continues to scan all versions of a column in that row and 
> all columns of that row because the ReturnCode from ColumnPrefixFilter has 
> been rewritten by the FilterList from NEXT_ROW to SKIP. 
> This is particularly inefficient when there are many versions in a column 
> because the check is performed on all versions of the column instead of just 
> by checking the qualifier of the column name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to