[
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guangxu Cheng updated HBASE-20636:
----------------------------------
Release Note:
Add two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED
1. ROWPREFIX_FIXED_LENGTH: specify the length of the prefix
2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix
Need to specify parameters for these two types of bloomfilter, otherwise the
table will fail to create
Example:
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH',
CONFIGURATION => {'RowPrefixBloomFilter.prefix_length' => '10'}}
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_DELIMITED', CONFIGURATION
=> {'RowPrefixDelimitedBloomFilter.delimiter' => '#'}}
Summary: Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and
ROWPREFIX_DELIMITED (was: Introduce two bloom filter type : ROWPREFIX and
ROWPREFIX_DELIMITED)
Add Release Note.Thanks [~anoop.hbase]. Thank [~apurtell] for commit. Thanks
for all the reviews.
> Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and
> ROWPREFIX_DELIMITED
> --------------------------------------------------------------------------------
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
> Issue Type: New Feature
> Components: HFile, regionserver, scan
> Reporter: Guangxu Cheng
> Assignee: Guangxu Cheng
> Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20636.master.001.patch,
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch,
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary
> files to improve read performance. But they only support Get and do not
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same
> prefix, such as Tencent Game. Game user's some operational record will be
> written into HBase, each game user will have a lot of records, the rowkey is
> constructed as userid+'#'+timestamps. So we can scan all records for a given
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and
> stopRow of the Scan has a valid common prefix, the scan will be allowed to
> use BloomFilter to filter files which will enhance the performance of the
> scan.
> Now, this feature has been running on our cluster over a year, and scan
> performance for this scenario has been improved by more than one times than
> before.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)