[jira] [Comment Edited] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

Allan Yang (JIRA) Thu, 24 May 2018 18:18:06 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490078#comment-16490078
 ]


Allan Yang edited comment on HBASE-20636 at 5/25/18 1:16 AM:
-------------------------------------------------------------

{quote}
Does this require a HFile version increment?
{quote}
[~apurtell], just reviewed the patch,  this does not need a HFile version 
increment.
{quote}
There should also be a fallback. We can at least process the HFile without 
blooms if we don't recognize the bloom filter type.
{quote}
Well, I think it is hard to fallback, since we read bloom filter type like this 
in StoreFileReader.loadFileInfo():
{code:java}
byte[] b = fi.get(BLOOM_FILTER_TYPE_KEY);
    if (b != null) {
      bloomFilterType = BloomType.valueOf(Bytes.toString(b));
    }
{code}
It the old version of RegionServer can't recognize the bloom filter type, it 
will throw a IllegalArgumentException here.


was (Author: allan163):
{quote}
Does this require a HFile version increment?
{quote}
[~apurte4ll], just reviewed the patch,  this does not need a HFile version 
increment.
{quote}
There should also be a fallback. We can at least process the HFile without 
blooms if we don't recognize the bloom filter type.
{quote}
Well, I think it is hard to fallback, since we read bloom filter type like this 
in StoreFileReader.loadFileInfo():
{code:java}
byte[] b = fi.get(BLOOM_FILTER_TYPE_KEY);
    if (b != null) {
      bloomFilterType = BloomType.valueOf(Bytes.toString(b));
    }
{code}
It the old version of RegionServer can't recognize the bloom filter type, it 
will throw a IllegalArgumentException here.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> -------------------------------------------------------------------
>
>                 Key: HBASE-20636
>                 URL: https://issues.apache.org/jira/browse/HBASE-20636
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Guangxu Cheng
>            Assignee: Guangxu Cheng
>            Priority: Major
>         Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

Reply via email to