clintropolis commented on a change in pull request #5712: HllSketch module URL: https://github.com/apache/incubator-druid/pull/5712#discussion_r217679221
########## File path: docs/content/development/extensions-core/datasketches-hll.md ########## @@ -0,0 +1,82 @@ +--- +layout: doc_page +--- + +## DataSketches HLL Sketch module + +This module provides Druid aggregators for distinct counting based on HLL sketch from [datasketches](http://datasketches.github.io/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row. +You can use the HLL sketch aggregator on columns of any identifiers. It will return estimated cardinality of the column. + +To use this aggregator, make sure you [include](../../operations/including-extensions.html) the extension in your config file: + +``` +druid.extensions.loadList=["druid-datasketches"] +``` + +### Aggregators + +```json Review comment: I don't think these should be marked as `json` since it isn't technically valid json with the `<` and `>` which causes the markdown to render funny ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
