Re: [PR] [Docs] Improve Bloom filter topic (druid)

via GitHub Tue, 10 Dec 2024 08:11:17 -0800


ektravel commented on code in PR #17547:
URL: https://github.com/apache/druid/pull/17547#discussion_r1878389473



##########
docs/development/extensions-core/bloom-filter.md:
##########
@@ -23,28 +23,25 @@ title: "Bloom Filter"
   -->
 
 
-To use this Apache Druid extension, 
[include](../../configuration/extensions.md#loading-extensions) 
`druid-bloom-filter` in the extensions load list.
+To use the Apache Druid&circledR; Bloom filter extension, include 
`druid-bloom-filter` in the extensions load list. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
 
-This extension adds the ability to both construct bloom filters from query 
results, and filter query results by testing
-against a bloom filter. A Bloom filter is a probabilistic data structure for 
performing a set membership check. A bloom
-filter is a good candidate to use with Druid for cases where an explicit 
filter is impossible, e.g. filtering a query
+This extension adds the ability to both construct Bloom filters from query 
results, and filter query results by testing
+against a Bloom filter. A Bloom filter is a probabilistic data structure for 
performing a set membership check. A Bloom
+filter is a good candidate to use with Druid for cases where an explicit 
filter is impossible, such as filtering a query
 against a set of millions of values.
 
 Following are some characteristics of Bloom filters:
 
-- Bloom filters are highly space efficient when compared to using a HashSet.
-- Because of the probabilistic nature of bloom filters, false positive results 
are possible (element was not actually
-inserted into a bloom filter during construction, but `test()` says true)
-- False negatives are not possible (if element is present then `test()` will 
never say false).
-- The false positive probability of this implementation is currently fixed at 
5%, but increasing the number of entries
-that the filter can hold can decrease this false positive rate in exchange for 
overall size.
-- Bloom filters are sensitive to number of elements that will be inserted in 
the bloom filter. During the creation of bloom filter expected number of 
entries must be specified. If the number of insertions exceed
- the specified initial number of entries then false positive probability will 
increase accordingly.
+- Bloom filters are highly space efficient compared to using a HashSet.
+- Because of the probabilistic nature of Bloom filters, false positive results 
are possible. For example, the `test()` function might return `true` for an 
element that wasn't inserted into the filter.

Review Comment:
   ```suggestion
   - Because they are probabilistic, false positive results are possible with 
Bloom filters. For example, the `test()` function might return `true` for an 
element that is not within the filter.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [Docs] Improve Bloom filter topic (druid)

Reply via email to