[ 
https://issues.apache.org/jira/browse/HBASE-29842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaehui Lee updated HBASE-29842:
-------------------------------
    Description: 
This patch implements the Ribbon Filter proposed in HBASE-27266.

[Design 
Docs|https://github.com/Jaehui-Lee/hbase/blob/9b56b81f2663056925e359fe494680074ad33500/dev-support/design-docs/HBASE-29842%20Ribbon%20Filter%20Design.md]
h2. Summary

Ribbon Filter is a space-efficient alternative to Bloom Filter, achieving 
approximately ~30% space savings while maintaining comparable query performance.
 - Bloom Filter requires ~9.6 bits/key for 1% FPR (44% overhead vs theoretical 
minimum)
 - Ribbon Filter achieves ~7.3 bits/key for 1% FPR (~10% overhead)

h2. Configuration

A new enum {{BloomFilterImpl}} is added to select the filter implementation 
independently from {{{}BloomType{}}}:
{code:java}
public enum BloomFilterImpl {
  BLOOM,   // Traditional Bloom filter (default)
  RIBBON   // Ribbon filter (more space-efficient)
}
{code}
h3. Per-Table Configuration

*HBase Shell:*
{code:ruby}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROW', BLOOMFILTER_IMPL => 
'RIBBON'}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROWCOL', BLOOMFILTER_IMPL => 
'RIBBON'}
{code}
*Java API:*
{code:java}
ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder
  .newBuilder(Bytes.toBytes("cf"))
  .setBloomFilterType(BloomType.ROW)
  .setBloomFilterImpl(BloomFilterImpl.RIBBON)
  .build();
{code}
h3. Global Configuration

A global default can be set in {{hbase-site.xml}} (case insensitive):
{code:xml}
<property>
  <name>io.storefile.bloom.filter.impl</name>
  <value>RIBBON</value>
</property>
{code}
When both global and per-table settings exist, the per-table setting takes 
precedence.

  was:
This patch implements the Ribbon Filter proposed in HBASE-27266.

[Design 
Docs|https://github.com/apache/hbase/blob/HBASE-29842/dev-support/design-docs/HBASE-29842%20Ribbon%20Filter%20Design.md]

h2. Summary

Ribbon Filter is a space-efficient alternative to Bloom Filter, achieving 
approximately ~30% space savings while maintaining comparable query performance.
 - Bloom Filter requires ~9.6 bits/key for 1% FPR (44% overhead vs theoretical 
minimum)
 - Ribbon Filter achieves ~7.3 bits/key for 1% FPR (~10% overhead)

h2. Configuration

A new enum {{BloomFilterImpl}} is added to select the filter implementation 
independently from {{BloomType}}:

{code:java}
public enum BloomFilterImpl {
  BLOOM,   // Traditional Bloom filter (default)
  RIBBON   // Ribbon filter (more space-efficient)
}
{code}

h3. Per-Table Configuration

*HBase Shell:*
{code:ruby}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROW', BLOOMFILTER_IMPL => 
'RIBBON'}
create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROWCOL', BLOOMFILTER_IMPL => 
'RIBBON'}
{code}

*Java API:*
{code:java}
ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder
  .newBuilder(Bytes.toBytes("cf"))
  .setBloomFilterType(BloomType.ROW)
  .setBloomFilterImpl(BloomFilterImpl.RIBBON)
  .build();
{code}

h3. Global Configuration

A global default can be set in {{hbase-site.xml}} (case insensitive):
{code:xml}
<property>
  <name>io.storefile.bloom.filter.impl</name>
  <value>RIBBON</value>
</property>
{code}

When both global and per-table settings exist, the per-table setting takes 
precedence.



> Add Ribbon Filter as an alternative to Bloom Filter
> ---------------------------------------------------
>
>                 Key: HBASE-29842
>                 URL: https://issues.apache.org/jira/browse/HBASE-29842
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jaehui Lee
>            Assignee: Jaehui Lee
>            Priority: Major
>              Labels: pull-request-available
>
> This patch implements the Ribbon Filter proposed in HBASE-27266.
> [Design 
> Docs|https://github.com/Jaehui-Lee/hbase/blob/9b56b81f2663056925e359fe494680074ad33500/dev-support/design-docs/HBASE-29842%20Ribbon%20Filter%20Design.md]
> h2. Summary
> Ribbon Filter is a space-efficient alternative to Bloom Filter, achieving 
> approximately ~30% space savings while maintaining comparable query 
> performance.
>  - Bloom Filter requires ~9.6 bits/key for 1% FPR (44% overhead vs 
> theoretical minimum)
>  - Ribbon Filter achieves ~7.3 bits/key for 1% FPR (~10% overhead)
> h2. Configuration
> A new enum {{BloomFilterImpl}} is added to select the filter implementation 
> independently from {{{}BloomType{}}}:
> {code:java}
> public enum BloomFilterImpl {
>   BLOOM,   // Traditional Bloom filter (default)
>   RIBBON   // Ribbon filter (more space-efficient)
> }
> {code}
> h3. Per-Table Configuration
> *HBase Shell:*
> {code:ruby}
> create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROW', BLOOMFILTER_IMPL => 
> 'RIBBON'}
> create 'mytable', {NAME => 'cf', BLOOMFILTER => 'ROWCOL', BLOOMFILTER_IMPL => 
> 'RIBBON'}
> {code}
> *Java API:*
> {code:java}
> ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder
>   .newBuilder(Bytes.toBytes("cf"))
>   .setBloomFilterType(BloomType.ROW)
>   .setBloomFilterImpl(BloomFilterImpl.RIBBON)
>   .build();
> {code}
> h3. Global Configuration
> A global default can be set in {{hbase-site.xml}} (case insensitive):
> {code:xml}
> <property>
>   <name>io.storefile.bloom.filter.impl</name>
>   <value>RIBBON</value>
> </property>
> {code}
> When both global and per-table settings exist, the per-table setting takes 
> precedence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to