Ray Mattingly created HBASE-27786:
-------------------------------------

             Summary: CompoundBloomFilters break with an error rate that is too 
high
                 Key: HBASE-27786
                 URL: https://issues.apache.org/jira/browse/HBASE-27786
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.5.2
            Reporter: Ray Mattingly


At my company we're beginning to more heavily utilize the bloom error rate 
configuration. This is because bloom filters are a nice optimization, but for 
well distributed workloads with relatively dense data (many rows:host), we've 
found that they can cause lots of memory/GC pressure unless they can entirely 
fit in the block cache (and consequently not churn memory that's subject to GC).

Because it's easier to estimate the memory requirements of changes in existing 
bloom filters, rather than net new bloom filters, we wanted to begin with very 
high bloom error rates (and consequently small bloom filters), and then ratchet 
down as memory availability allowed.

This led to us discovering that bloom filters appear to become corrupt at a 
relatively arbitrary error rate threshold. Blooms with an error rate of 0.61 
work as expected, but produce nonsensical results with an error rate of 0.62. 
I've pushed this branch with test updates to demonstrate the deficit: 
[https://github.com/apache/hbase/compare/master...HubSpot:hbase:rmattingly/bloom-error-rate-bug]

The test changes confirm that the BloomFilterUtil works as expected, at least 
with respect to its error rate : size relationship. You can see this in the 
output of {{{}TestBloomFilterChunk#testBloomErrorRateSizeRelationship{}}}:

 
{noformat}
previousErrorRate=0.01, previousSize=1048568
currentErrorRate=0.05, currentSize=682109
previousErrorRate=0.05, previousSize=682109
currentErrorRate=0.1, currentSize=524284
previousErrorRate=0.1, previousSize=524284
currentErrorRate=0.2, currentSize=366459
previousErrorRate=0.2, previousSize=366459
currentErrorRate=0.4, currentSize=208634
previousErrorRate=0.4, previousSize=208634
currentErrorRate=0.5, currentSize=157826
previousErrorRate=0.5, previousSize=157826
currentErrorRate=0.75, currentSize=65504
previousErrorRate=0.75, previousSize=65504
currentErrorRate=0.99, currentSize=2289
{noformat}
 

With this in mind, the updates to {{TestCompoundBloomFilter}} tell us that the 
bug must live somewhere in the {{CompoundBloomFilter}} logic. The output 
indicates this:

 
{noformat}
2023-04-10T15:07:50,925 INFO  [Time-limited test] 
regionserver.TestCompoundBloomFilter(245): Functional bloom has error rate 0.01 
and size 1kb
...
2023-04-10T15:07:56,657 INFO  [Time-limited test] 
regionserver.TestCompoundBloomFilter(245): Functional bloom has error rate 0.61 
and size 1kb
...
java.lang.AssertionError: False positive is too high: 0.9998533333333334 
(greater than 0.65), fake lookup is enabled. Bloom size is 4687kb
    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.assertTrue(Assert.java:42)
    at 
org.apache.hadoop.hbase.regionserver.TestCompoundBloomFilter.readStoreFile(TestCompoundBloomFilter.java:243)
{noformat}
 

The bloom size change from ~1kb -> 4687kb and total lack of precision is 
clearly not as intended, and totally inline with what we saw in our HBase 
clusters that attempted to use high bloom error rates.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to