clintropolis commented on a change in pull request #6397: Adds bloom filter 
aggregator to 'druid-bloom-filters' extension
URL: https://github.com/apache/incubator-druid/pull/6397#discussion_r246884596
 
 

 ##########
 File path: docs/content/development/extensions-core/bloom-filter.md
 ##########
 @@ -42,4 +50,53 @@ Internally, this implementation of bloom filter uses 
Murmur3 fast non-cryptograp
  - 1 big endian int(That is how OutputStream works) for the number of longs in 
the bitset
  - big endian longs in the BloomKFilter bitset
      
-Note: `org.apache.hive.common.util.BloomKFilter` provides a serialize method 
which can be used to serialize bloom filters to outputStream.
\ No newline at end of file
+Note: `org.apache.hive.common.util.BloomKFilter` provides a serialize method 
which can be used to serialize bloom filters to outputStream.
+
+## Bloom Filter Query Aggregator
+Input for a `bloomKFilter` can also be created from a druid query with the 
`bloom` aggregator.
+
+### JSON Specification of Bloom Filter Aggregator
+```json
+{
+      "type": "bloomFilter",
+      "name": <output_field_name>,
+      "maxNumEntries": <maximum_number_of_elements_for_BloomKFilter>
+      "field": <dimension_spec>
+    }
+```
+
+|Property                 |Description                   |required?            
               |
+|-------------------------|------------------------------|----------------------------------|
+|`type`                   |Aggregator Type. Should always be `bloom`|yes|
+|`name`                   |Output field name |yes|
+|`field`                  |[DimensionSpec](./../dimensionspecs.html) to add to 
`org.apache.hive.common.util.BloomKFilter` | yes |
+|`maxNumEntries`          |Maximum number of distinct values supported by 
`org.apache.hive.common.util.BloomKFilter`, default `1500`| no |
 
 Review comment:
   Updated docs to include fixed 5% false positive rate, though no formula for 
how changing `maxNumEntries` affects that yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to