This is an automated email from the ASF dual-hosted git repository.
fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
The following commit(s) were added to refs/heads/master by this push:
new 0f6cb1e Update theta/hll sketch doc comparison (#7407)
0f6cb1e is described below
commit 0f6cb1e7e032081a569bbcf0c89220f0e6b53472
Author: Jonathan Wei <[email protected]>
AuthorDate: Wed Apr 3 15:21:33 2019 -0700
Update theta/hll sketch doc comparison (#7407)
---
docs/content/querying/aggregations.md | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/docs/content/querying/aggregations.md
b/docs/content/querying/aggregations.md
index eef4c68..2e253fe 100644
--- a/docs/content/querying/aggregations.md
+++ b/docs/content/querying/aggregations.md
@@ -275,19 +275,28 @@ The [DataSketches Theta
Sketch](../development/extensions-core/datasketches-thet
#### DataSketches HLL Sketch
-The [DataSketches HLL
Sketch](../development/extensions-core/datasketches-hll.html)
extension-provided aggregator gives distinct count estimates using the
HyperLogLog algorithm. The HLL Sketch is faster and requires less storage than
the Theta Sketch, but does not support intersection or difference operations.
+The [DataSketches HLL
Sketch](../development/extensions-core/datasketches-hll.html)
extension-provided aggregator gives distinct count estimates using the
HyperLogLog algorithm.
+
+Compared to the Theta sketch, the HLL sketch does not support set operations
and has slightly slower update and merge speed, but requires significantly less
space.
#### Cardinality/HyperUnique (Deprecated)
<div class="note caution">
-The Cardinality and HyperUnique aggregators are deprecated. Please use <a
href="../development/extensions-core/datasketches-hll.html">DataSketches HLL
Sketch</a> instead.
+The Cardinality and HyperUnique aggregators are deprecated. Please use <a
href="../development/extensions-core/datasketches-theta.html">DataSketches
Theta Sketch</a> or <a
href="../development/extensions-core/datasketches-hll.html">DataSketches HLL
Sketch</a> instead.
</div>
-The [Cardinality and HyperUnique](../querying/hll-old.html) aggregators are
older aggregator implementations available by default in Druid that also
provide distinct count estimates using the HyperLogLog algorithm. The newer
[DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.html)
extension-provided aggregator has superior accuracy and performance and is
recommended instead.
+The [Cardinality and HyperUnique](../querying/hll-old.html) aggregators are
older aggregator implementations available by default in Druid that also
provide distinct count estimates using the HyperLogLog algorithm. The newer
DataSketches Theta and HLL extension-provided aggregators described above have
superior accuracy and performance and are recommended instead.
The DataSketches team has published a [comparison
study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html)
between Druid's original HLL algorithm and the DataSketches HLL algorithm.
Based on the demonstrated advantages of the DataSketches implementation, we
have deprecated Druid's original HLL aggregator.
-Please note that DataSketches HLL aggregators and `hyperUnique` aggregators
are not mutually compatible.
+Please note that `hyperUnique` aggregators are not mutually compatible with
Datasketches HLL or Theta sketches.
+
+##### Multi-column handling
+
+Note the DataSketches Theta and HLL aggregators currently only support
single-column inputs. If you were previously using the Cardinality aggregator
with multiple-column inputs, equivalent operations using Theta or HLL sketches
are described below:
+
+* Multi-column `byValue` Cardinality can be replaced with a union of Theta
sketches on the individual input columns
+* Multi-column `byRow` Cardinality can be replaced with a Theta or HLL sketch
on a single [virtual column]((../querying/virtual-columns.html) that combines
the individual input columns.
### Histograms and quantiles
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]