linrrzqqq commented on code in PR #3711:
URL: https://github.com/apache/doris-website/pull/3711#discussion_r3328894500
##########
docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md:
##########
@@ -0,0 +1,101 @@
+{
+"title": "DATASKETCHES_HLL_UNION_AGG",
+"language": "en",
+"description": "The datasketches_hll_union_agg function is an aggregate
function used to union multiple Apache DataSketches HLL sketches and return the
estimated cardinality of the union as a DOUBLE value."
+}
+---
+
+## Description
+
+`datasketches_hll_union_agg` is an aggregate function used to **union**
multiple Apache DataSketches **HLL** (`hll_sketch`) serialized values and
return the **estimated cardinality** (approximate distinct count / NDV) after
union.
+
+This function expects the input to be **serialized bytes of a DataSketches HLL
sketch** (for example, generated by `hll_sketch.serialize_compact()` in the
DataSketches library). It does not accept arbitrary strings.
+
+Aliases:
+
+- `ds_hll_estimate`
+- `datasketches_hll_estimate`
+
+## Syntax
+
+```sql
+datasketches_hll_union_agg(<sketch>)
+```
+
+## Parameters
+
+| Parameter | Description |
+| -- | -- |
+| `<sketch>` | The serialized bytes of an Apache DataSketches HLL sketch.
Supported types: STRING / VARCHAR / BINARY / VARBINARY. NULL values are
ignored. Empty strings are treated as invalid input and will throw an error. |
+
+## Return Value
+
+Returns a DOUBLE (Float64) cardinality estimate value.
+If there is no valid data in the group (or the input is empty), returns 0.
+If the input bytes cannot be deserialized as a valid DataSketches HLL sketch
(including empty string), an error is thrown (typically with error code
`CORRUPTION`).
+
+## Example
+
+```sql
+-- setup
+CREATE TABLE test_datasketches_hll_union_agg_tbl (
+ id INT,
+ sk STRING
+)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+-- The sketch bytes are inserted via Base64 decoding.
+INSERT INTO test_datasketches_hll_union_agg_tbl VALUES
+ (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')),
+ (2,
from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')),
+ (3, NULL);
+```
+
+```sql
+-- The function returns DOUBLE, so use ROUND/CAST if you want an integer
display.
+SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT)
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++------------------------------------------------------+
+| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) |
++------------------------------------------------------+
+| 17 |
++------------------------------------------------------+
+```
+
+```sql
+-- aliases
+SELECT
+ CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT),
+ CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT),
+ CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT)
+FROM test_datasketches_hll_union_agg_tbl;
Review Comment:
result
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]