[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165113#comment-17165113
 ] 

Alex Batyrshin commented on HBASE-20636:
----------------------------------------

Any ideas why this feature only works at scanner opening time and not inside 
scan with StoreFileScanner.requestSeek() ?

The only bloom that is used inside already opened scan is ROWCOL - 
[https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L384-L396]

> Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and 
> ROWPREFIX_DELIMITED
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-20636
>                 URL: https://issues.apache.org/jira/browse/HBASE-20636
>             Project: HBase
>          Issue Type: New Feature
>          Components: HFile, regionserver, Scanners
>            Reporter: Guangxu Cheng
>            Assignee: Guangxu Cheng
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.2.0
>
>         Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to