[
https://issues.apache.org/jira/browse/HBASE-29842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaehui Lee updated HBASE-29842:
-------------------------------
Description:
This patch implements the Ribbon Filter proposed in HBASE-27266.
[Design
Docs|https://github.com/apache/hbase/blob/HBASE-29842/dev-support/design-docs/HBASE-29842%20Ribbon%20Filter%20Design.md]
h2. Summary
Ribbon Filter is a space-efficient alternative to Bloom Filter, achieving
approximately ~30% space savings while maintaining comparable query performance.
- Bloom Filter requires ~9.6 bits/key for 1% FPR (44% overhead vs theoretical
minimum)
- Ribbon Filter achieves ~7.3 bits/key for 1% FPR (~10% overhead)
h2. Configuration
A new enum {{BloomFilterImpl}} is added to select the filter implementation
independently from {{BloomType}}:
{code:java}
public enum BloomFilterImpl {
BLOOM, // Traditional Bloom filter (default)
RIBBON // Ribbon filter (more space-efficient)
}
{code}
h3. Per-Table Configuration
*HBase Shell:*
{code:ruby}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROW', BLOOMFILTER_IMPL =>
'RIBBON'}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROWCOL', BLOOMFILTER_IMPL =>
'RIBBON'}
{code}
*Java API:*
{code:java}
ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder
.newBuilder(Bytes.toBytes("cf"))
.setBloomFilterType(BloomType.ROW)
.setBloomFilterImpl(BloomFilterImpl.RIBBON)
.build();
{code}
h3. Global Configuration
A global default can be set in {{hbase-site.xml}} (case insensitive):
{code:xml}
<property>
<name>io.storefile.bloom.filter.impl</name>
<value>RIBBON</value>
</property>
{code}
When both global and per-table settings exist, the per-table setting takes
precedence.
was:
This patch implements the Ribbon Filter proposed in HBASE-27266.
[Design
Docs|https://github.com/apache/hbase/blob/5298fd5951448b8b88ad29cb20819f34c19830e1/dev-support/design-docs/HBASE-29842%20Ribbon%20Filter%20Design.md]
h2. Summary
Ribbon Filter is a space-efficient alternative to Bloom Filter, achieving
approximately ~30% space savings while maintaining comparable query performance.
- Bloom Filter requires ~9.6 bits/key for 1% FPR (44% overhead vs theoretical
minimum)
- Ribbon Filter achieves ~7.3 bits/key for 1% FPR (~10% overhead)
h2. New BloomType Options
- {{{}RIBBON_ROW{}}}: Row-based Ribbon filter (alternative to {{{}ROW{}}})
- {{{}RIBBON_ROWCOL{}}}: Row+Column-based Ribbon filter (alternative to
{{{}ROWCOL{}}})
h3. Usage Example
*HBase Shell:*
{code:java}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'RIBBON_ROW'}
alter 'mytable', {NAME => 'cf', BLOOMFILTER => 'RIBBON_ROWCOL'}
{code}
*Java API:*
{code:java}
ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder
.newBuilder(Bytes.toBytes("cf"))
.setBloomFilterType(BloomType.RIBBON_ROW)
.build();
{code}
> Add Ribbon Filter as an alternative to Bloom Filter
> ---------------------------------------------------
>
> Key: HBASE-29842
> URL: https://issues.apache.org/jira/browse/HBASE-29842
> Project: HBase
> Issue Type: New Feature
> Reporter: Jaehui Lee
> Assignee: Jaehui Lee
> Priority: Major
> Labels: pull-request-available
>
> This patch implements the Ribbon Filter proposed in HBASE-27266.
> [Design
> Docs|https://github.com/apache/hbase/blob/HBASE-29842/dev-support/design-docs/HBASE-29842%20Ribbon%20Filter%20Design.md]
> h2. Summary
> Ribbon Filter is a space-efficient alternative to Bloom Filter, achieving
> approximately ~30% space savings while maintaining comparable query
> performance.
> - Bloom Filter requires ~9.6 bits/key for 1% FPR (44% overhead vs
> theoretical minimum)
> - Ribbon Filter achieves ~7.3 bits/key for 1% FPR (~10% overhead)
> h2. Configuration
> A new enum {{BloomFilterImpl}} is added to select the filter implementation
> independently from {{BloomType}}:
> {code:java}
> public enum BloomFilterImpl {
> BLOOM, // Traditional Bloom filter (default)
> RIBBON // Ribbon filter (more space-efficient)
> }
> {code}
> h3. Per-Table Configuration
> *HBase Shell:*
> {code:ruby}
> create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROW', BLOOMFILTER_IMPL =>
> 'RIBBON'}
> create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROWCOL', BLOOMFILTER_IMPL =>
> 'RIBBON'}
> {code}
> *Java API:*
> {code:java}
> ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder
> .newBuilder(Bytes.toBytes("cf"))
> .setBloomFilterType(BloomType.ROW)
> .setBloomFilterImpl(BloomFilterImpl.RIBBON)
> .build();
> {code}
> h3. Global Configuration
> A global default can be set in {{hbase-site.xml}} (case insensitive):
> {code:xml}
> <property>
> <name>io.storefile.bloom.filter.impl</name>
> <value>RIBBON</value>
> </property>
> {code}
> When both global and per-table settings exist, the per-table setting takes
> precedence.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)