snleee edited a comment on issue #3528: Adding support for bloom filter URL: https://github.com/apache/incubator-pinot/pull/3528#issuecomment-442704522 When I tried to limit the bloom filter size to 1MB (by computing max false positive using formale), I found that clear spring implementation does not behave as expected for the cases when we have high cardinality while Guava implementation is working as expected. Please refer to the size of bloom filter (Guava's bloom filter size is correctly capped at <1MB while clearspring implementation uses larger size). It seems that Guava's implementation is more robust with the capping the maximum size of bloom filter. @kishoreg ``` cardinality: 1,000,000 maxFalsePosProbability: 0.05 numBitsRequired (Estimated): 6235225 requiredSize (Estimated): 779403 clear spring size: 875085 Gauva size: 779414 Roaring bitmap size: 507068 --------------------------------------------- cardinality: 3,000,000 numHashFunction: 2 maxFalsePosProbability: 0.2610525068636746 numBitsRequired (Estimated): 8386047 requiredSize (Estimated): 1048255 clear spring size:1125085 Gauva size: 1048262 Roaring bitmap size: 742946 --------------------------------------------- cardinality: 5,000,000 numHashFunction: 1 maxFalsePosProbability: 0.4490143136505804 numBitsRequired (Estimated): 8332767 requiredSize (Estimated): 1041595 clear spring size: 625085 Gauva size: 1041606 --------------------------------------------- cardinality: 10,000,000 numHashFunction: 1 maxFalsePosProbability: 0.696414773438059 numBitsRequired (Estimated): 7530599 requiredSize (Estimated): 941324 clear spring size: 1250085 Gauva size: 941334 --------------------------------------------- cardinality: 30,000,000 numHashFunction: 1 maxFalsePosProbability: 0.9720203742797628 numBitsRequired (Estimated): 1771985 requiredSize (Estimated): 221498 clear spring size: 3750085 Gauva size: 221510 ---------------------------------------------- numHashFunction: 1 cardinality: 50,000,000 maxFalsePosProbability: 0.9974212860608853 numBitsRequired (Estimated): 268710 requiredSize (Estimated): 33588 clear spring size: 6250085 Gauva size: 33598 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
