gianm commented on a change in pull request #6222: Add ability to pass in Bloom 
filter from Hive Queries
URL: https://github.com/apache/incubator-druid/pull/6222#discussion_r220571447
 
 

 ##########
 File path: docs/content/development/extensions-core/bloom-filter.md
 ##########
 @@ -0,0 +1,45 @@
+---
+layout: doc_page
+---
+
+# Druid Bloom Filter
+
+Make sure to [include](../../operations/including-extensions.html) 
`druid-bloom-filter` as an extension.
+
+BloomFilter is a probabilistic data structure for set membership check. 
+Following are some characterstics of BloomFilter 
+- BloomFilters are highly space efficient when compared to using a HashSet.
+- Because of the probabilistic nature of bloom filter false positive (element 
not present in bloom filter but test() says true) are possible
+- false negatives are not possible (if element is present then test() will 
never say false). 
+- The false positive probability is configurable (default: 5%) depending on 
which storage requirement may increase or decrease. 
+- Lower the false positive probability greater is the space requirement.
+- Bloom filters are sensitive to number of elements that will be inserted in 
the bloom filter.
+- During the creation of bloom filter expected number of entries must be 
specified.If the number of insertions exceed the specified initial number of 
entries then false positive probability will increase accordingly.
+
+Internally, this implementation of bloom filter uses Murmur3 fast 
non-cryptographic hash algorithm.
+
+### Json Representation of Bloom Filter
+```json
+{
+  "type" : "bloom",
+  "dimension" : <dimension_name>,
+  "bloomKFilter" : <serialized_bytes_for_BloomKFilter>,
+  "extractionFn" : <extraction_fn>
+}
+```
+
+|Property                 |Description                   |required?            
               |
+|-------------------------|------------------------------|----------------------------------|
+|`type`                   |Filter Type. Should always be `bloom`|yes|
+|`dimension`              |The dimension to filter over. | yes |
+|`bloomKFilter`           |Binary representation of 
`org.apache.hive.common.util.BloomKFilter`| yes |
 
 Review comment:
   To be clear, if we did evolve it that way, it should still support the hive 
format (to work with the original use case from @nishantmonu51). I just think 
there is a lot of potential for this extension to be useful beyond hive 
integration.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to