snleee edited a comment on issue #3528: Adding support for bloom filter
URL: https://github.com/apache/incubator-pinot/pull/3528#issuecomment-442704522
 
 
   When I tried to limit the bloom filter size to 1MB (by computing max false 
positive using formale), I found that clear spring implementation does not 
behave as expected for the cases when we have high cardinality while Guava 
implementation is working as expected. Please refer to the size of bloom filter 
(Guava's bloom filter size is correctly capped at <1MB while clearspring 
implementation uses larger size). It seems that Guava's implementation is more 
robust with the capping the maximum size of bloom filter. @kishoreg 
   
   ```
   cardinality: 1,000,000
   numHashFunction: 4
   maxFalsePosProbability: 0.05
   
   numBitsRequired (Estimated): 6235225
   requiredSize (Estimated): 779403
   clear spring size: 875085
   Gauva size: 779414
   ---------------------------------------------
   
   cardinality: 3,000,000
   numHashFunction: 2
   maxFalsePosProbability: 0.2610525068636746
   
   numBitsRequired (Estimated): 8386047
   requiredSize (Estimated): 1048255
   clear spring size:1125085
   Gauva size: 1048262
   ---------------------------------------------
   
   cardinality: 5,000,000
   numHashFunction: 1
   maxFalsePosProbability: 0.4490143136505804
   
   
   numBitsRequired (Estimated): 8332767
   requiredSize (Estimated): 1041595
   clear spring size: 625085
   Gauva size: 1041606
   
   ---------------------------------------------
   
   cardinality: 10,000,000
   numHashFunction: 1
   maxFalsePosProbability: 0.696414773438059
   
   numBitsRequired (Estimated): 7530599
   requiredSize (Estimated): 941324
   clear spring size: 1250085
   Gauva size: 941334
   ---------------------------------------------
   
   cardinality: 30,000,000
   numHashFunction: 1
   maxFalsePosProbability: 0.9720203742797628
   
   numBitsRequired (Estimated): 1771985
   requiredSize (Estimated): 221498
   clear spring size: 3750085
   Gauva size: 221510
   ----------------------------------------------
   
   numHashFunction: 1
   cardinality: 50,000,000
   maxFalsePosProbability: 0.9974212860608853
   
   numBitsRequired (Estimated): 268710
   requiredSize (Estimated): 33588
   clear spring size: 6250085
   Gauva size: 33598
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to