This is an automated email from the ASF dual-hosted git repository. panxiaolei pushed a commit to branch dev_0825 in repository https://gitbox.apache.org/repos/asf/doris-website.git
commit d3a5ba87d5047eb803018538bc57c9529f0136a3 Author: BiteTheDDDDt <[email protected]> AuthorDate: Mon Aug 25 16:28:32 2025 +0800 update hll series/ kurt/ maxby/minby/median/quantile_union --- .../aggregate-functions/hll-raw-agg.md | 50 ++++++++++++---- .../aggregate-functions/hll-union-agg.md | 49 ++++++++++++---- .../sql-functions/aggregate-functions/kurt.md | 54 ++++++++--------- .../aggregate-functions/linear-histogram.md | 68 ++++++++++++++-------- .../sql-functions/aggregate-functions/max-by.md | 49 ++++++++++------ .../sql-functions/aggregate-functions/median.md | 33 ++++++++--- .../sql-functions/aggregate-functions/min-by.md | 49 ++++++++++------ .../aggregate-functions/quantile-union.md | 33 ++++++++--- .../aggregate-functions/hll-raw-agg.md | 49 +++++++++++++--- .../aggregate-functions/hll-union-agg.md | 46 ++++++++++++--- .../sql-functions/aggregate-functions/kurt.md | 48 +++++++-------- .../aggregate-functions/linear-histogram.md | 46 +++++++++++---- .../sql-functions/aggregate-functions/max-by.md | 42 ++++++++----- .../sql-functions/aggregate-functions/median.md | 51 +++++++++++++--- .../sql-functions/aggregate-functions/min-by.md | 43 +++++++++----- .../aggregate-functions/quantile-union.md | 30 +++++++--- 16 files changed, 526 insertions(+), 214 deletions(-) diff --git a/docs/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md b/docs/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md index 6b54436fffc..d56bf684f65 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md @@ -5,9 +5,10 @@ } --- + ## Description -The HLL_RAW_AGG function is an aggregate function, which is mainly used to merge multiple HyperLogLog data structures. +The HLL_RAW_AGG function is an aggregate function mainly used to merge multiple HyperLogLog data structures into one. ## Alias @@ -17,28 +18,57 @@ The HLL_RAW_AGG function is an aggregate function, which is mainly used to merge ```sql HLL_RAW_AGG(<hll>) +HLL_UNION(<hll>) ``` ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `<hll>` | The HyperLogLog type expression to be calculated | +| `<hll>` | The expression to be calculated, type HLL supported. | ## Return Value -Returns the aggregated value of type HyperLogLog. +Returns the aggregated HLL type. +If there is no valid data in the group, returns HLL_EMPTY. ## Example ```sql -select HLL_CARDINALITY(HLL_RAW_AGG(uv_set)) from test_uv; +-- setup +create table test_uv( + id int, + uv_set string +) distributed by hash(id) buckets 1 +properties ("replication_num"="1"); +insert into test_uv values + (1, ('a')), + (1, ('b')), + (2, ('c')), + (2, ('d')), + (3, null); +``` + +```sql +select HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) from test_uv; +``` + +```text ++------------------------------------------------+ +| HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) | ++------------------------------------------------+ +| 4 | ++------------------------------------------------+ +``` + +```sql +select HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) from test_uv where uv_set is null; ``` ```text -+------------------------------------------+ -| HLL_CARDINALITY(HLL_RAW_AGG(`uv_set`)) | -+------------------------------------------+ -| 17721 | -+------------------------------------------+ ++------------------------------------------------+ +| HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) | ++------------------------------------------------+ +| 0 | ++------------------------------------------------+ ``` \ No newline at end of file diff --git a/docs/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md b/docs/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md index 50b6278ef90..3c5f439afae 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md @@ -5,9 +5,10 @@ } --- + ## Description -The HLL_UNION_AGG function is an aggregate function, which is mainly used to merge multiple HyperLogLog data structures and estimate the approximate value of the combined cardinality. +The HLL_UNION_AGG function is an aggregate function mainly used to merge multiple HyperLogLog data structures and estimate the approximate cardinality after merging. ## Syntax @@ -17,24 +18,52 @@ hll_union_agg(<hll>) ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `<hll>` | The HyperLogLog type expression to be calculated | +| `<hll>` | The expression to be calculated, type HLL supported. | ## Return Value -Returns the cardinality value of type BIGINT. +Returns a BIGINT cardinality value. +If there is no valid data in the group, returns 0. ## Example ```sql -select HLL_UNION_AGG(uv_set) from test_uv; +-- setup +create table test_uv( + id int, + uv_set string +) distributed by hash(id) buckets 1 +properties ("replication_num"="1"); +insert into test_uv values + (1, ('a')), + (1, ('b')), + (2, ('c')), + (2, ('d')), + (3, null); +``` + +```sql +select HLL_UNION_AGG(HLL_HASH(uv_set)) from test_uv; +``` + +```text ++---------------------------------+ +| HLL_UNION_AGG(HLL_HASH(uv_set)) | ++---------------------------------+ +| 4 | ++---------------------------------+ +``` + +```sql +select HLL_UNION_AGG(HLL_HASH(uv_set)) from test_uv where uv_set is null; ``` ```text -+-------------------------+ -| HLL_UNION_AGG(`uv_set`) | -+-------------------------+ -| 17721 | -+-------------------------+ ++---------------------------------+ +| HLL_UNION_AGG(HLL_HASH(uv_set)) | ++---------------------------------+ +| 0 | ++---------------------------------+ ``` \ No newline at end of file diff --git a/docs/sql-manual/sql-functions/aggregate-functions/kurt.md b/docs/sql-manual/sql-functions/aggregate-functions/kurt.md index b1936bbb134..63627a4e700 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/kurt.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/kurt.md @@ -5,48 +5,51 @@ } --- + ## Description -The KURTOSIS function returns the [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of the expr expression. -The forumula used for this function is `4-th centrol moment / ((variance)^2) - 3`. +The KURTOSIS function calculates the [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of the data. The formula used is: fourth central moment / (variance squared) - 3. ## Alias -KURT_POP,KURTOSIS +KURT_POP, KURTOSIS ## Syntax ```sql KURTOSIS(<expr>) +KURT_POP(<expr>) +KURT(<expr>) ``` ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `<expr>` | The expression needs to be obtained | +| `<expr>` | The expression to calculate, type Double supported. | ## Return Value -Returns a value of type DOUBLE. Special cases: -- Returns NULL when the variance is zero. +Returns a DOUBLE value. +Returns NULL when variance is zero. +Returns NULL when there is no valid data in the group. ## Example ```sql -select * from statistic_test; -``` - -```text -+-----+------+------+ -| tag | val1 | val2 | -+-----+------+------+ -| 1 | -10 | -10| -| 2 | -20 | NULL| -| 3 | 100 | NULL| -| 4 | 100 | NULL| -| 5 | 1000 | 1000| -+-----+------+------+ +-- setup +create table statistic_test( + tag int, + val1 double, + val2 double +) distributed by hash(tag) buckets 1 +properties ("replication_num"="1"); +insert into statistic_test values + (1, -10, -10), + (2, -20, null), + (3, 100, null), + (4, 100, null), + (5, 1000, 1000); ``` ```sql @@ -54,15 +57,14 @@ select kurt(val1), kurt(val2) from statistic_test; ``` ```text -+-------------------+--------------------+ -| kurt(val1) | kurt(val2) | -+-------------------+--------------------+ -| 0.162124583734851 | -1.3330994719286338 | -+-------------------+--------------------+ ++---------------------+------------+ +| kurt(val1) | kurt(val2) | ++---------------------+------------+ +| 0.16212458373485106 | -2 | ++---------------------+------------+ ``` ```sql -// Each group just has one row, result is NULL select kurt(val1), kurt(val2) from statistic_test group by tag; ``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/linear-histogram.md b/docs/sql-manual/sql-functions/aggregate-functions/linear-histogram.md index 05fc2abfe50..a3bbaa65e0a 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/linear-histogram.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/linear-histogram.md @@ -5,39 +5,50 @@ } --- + ## Description -The LINEAR_HISTOGRAM function is used to describe the data distribution. It uses an "equal width" bucking strategy, and divides the data into buckets according to the value of the data. +The LINEAR_HISTOGRAM function is used to describe data distribution. It uses an "equal width" bucketing strategy and divides the data into buckets according to the value size. ## Syntax ```sql -`LINEAR_HISTOGRAM(<expr>, DOUBLE <interval>[, DOUBLE <offset>)` +LINEAR_HISTOGRAM(<expr>, DOUBLE <interval>[, DOUBLE <offset>]) ``` ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `interval` | Required. The width of the bucket. | -| `offset` | Optional. The default value is 0, and the range is `[0, interval)`. | +| `interval` | The width of the bucket, supports types: TinyInt, SmallInt, Int, BigInt, LargeInt, Float, Double, Decimal. | +| `offset` | Optional. Default is 0, range is `[0, interval)`, type Double supported. | ## Return Value -Returns a value of the computed JSON type. +Returns a computed value of JSON type. ## Example +```sql +-- setup +create table histogram_test( + a int +) distributed by hash(a) buckets 1 +properties ("replication_num"="1"); +insert into histogram_test values + (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (null); +``` + ```sql select linear_histogram(a, 2) from histogram_test; ``` ```text -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| linear_histogram(a, cast(2 as DOUBLE)) | -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| {"num_buckets":6,"buckets":[{"lower":0.0,"upper":2.0,"count":2,"acc_count":2},{"lower":2.0,"upper":4.0,"count":4,"acc_count":6},{"lower":4.0,"upper":6.0,"count":4,"acc_count":10},{"lower":6.0,"upper":8.0,"count":4,"acc_count":14},{"lower":8.0,"upper":10.0,"count":4,"acc_count":18},{"lower":10.0,"upper":12.0,"count":2,"acc_count":20}]} | -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| linear_histogram(a, 2) | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| {"num_buckets":6,"buckets":[{"lower":0.0,"upper":2.0,"count":2,"acc_count":2},{"lower":2.0,"upper":4.0,"count":2,"acc_count":4},{"lower":4.0,"upper":6.0,"count":2,"acc_count":6},{"lower":6.0,"upper":8.0,"count":2,"acc_count":8},{"lower":8.0,"upper":10.0,"count":2,"acc_count":10},{"lower":10.0,"upper":12.0,"count":2,"acc_count":12}]} | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` ```sql @@ -45,20 +56,29 @@ select linear_histogram(a, 2, 1) from histogram_test; ``` ```text -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| linear_histogram(a, cast(2 as DOUBLE), cast(1 as DOUBLE)) | -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| {"num_buckets":5,"buckets":[{"lower":1.0,"upper":3.0,"count":4,"acc_count":4},{"lower":3.0,"upper":5.0,"count":4,"acc_count":8},{"lower":5.0,"upper":7.0,"count":4,"acc_count":12},{"lower":7.0,"upper":9.0,"count":4,"acc_count":16},{"lower":9.0,"upper":11.0,"count":4,"acc_count":20}]} | -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| linear_histogram(a, 2, 1) | ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| {"num_buckets":7,"buckets":[{"lower":-1.0,"upper":1.0,"count":1,"acc_count":1},{"lower":1.0,"upper":3.0,"count":2,"acc_count":3},{"lower":3.0,"upper":5.0,"count":2,"acc_count":5},{"lower":5.0,"upper":7.0,"count":2,"acc_count":7},{"lower":7.0,"upper":9.0,"count":2,"acc_count":9},{"lower":9.0,"upper":11.0,"count":2,"acc_count":11},{"lower":11.0,"upper":13.0,"count":1,"acc_count":12}]} | ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +```sql +select linear_histogram(a, 2, 1) from histogram_test where a is null; ``` ```text -Field description: - -- `num_buckets`: The number of buckets. -- `buckets`: All buckets. - - `lower`: Lower bound of the bucket. (included) - - `upper`: Upper bound of the bucket. (not included) - - `count`: The number of elements contained in the bucket. - - `acc_count`: Accumulated count. ++--------------------------------+ +| linear_histogram(a, 2, 1) | ++--------------------------------+ +| {"num_buckets":0,"buckets":[]} | ``` + +Field description: + +- `num_buckets`: Number of buckets. +- `buckets`: Buckets in the histogram. + - `lower`: Lower bound (inclusive). + - `upper`: Upper bound (exclusive). + - `count`: Number of elements in the bucket. + - `acc_count`: Accumulated count up to this bucket. diff --git a/docs/sql-manual/sql-functions/aggregate-functions/max-by.md b/docs/sql-manual/sql-functions/aggregate-functions/max-by.md index fd8ce43ec0b..ce9d96cb9c3 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/max-by.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/max-by.md @@ -5,9 +5,10 @@ } --- + ## Description -The MAX_BY function is used to return the corresponding associated value based on the maximum value of the specified column. +The MAX_BY function returns the associated value based on the maximum value of the specified column. ## Syntax @@ -17,30 +18,32 @@ MAX_BY(<expr1>, <expr2>) ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `<expr1>` | The expression used to specify the corresponding association. | -| `<expr2>` | The expression used to specify the maximum value for statistics. | +| `<expr1>` | The expression for the associated value, supports types: Bool, TinyInt, SmallInt, Int, BigInt, LargeInt, Float, Double, Decimal, String, Date, Datetime. | +| `<expr2>` | The expression for the maximum value, supports types: Bool, TinyInt, SmallInt, Int, BigInt, LargeInt, Float, Double, Decimal, String, Date, Datetime. | ## Return Value -Returns the same data type as the input expression <expr1>. +Returns the same data type as <expr1>. +Returns NULL if there is no valid data in the group. ## Example ```sql -select * from tbl; -``` - -```text -+------+------+------+------+ -| k1 | k2 | k3 | k4 | -+------+------+------+------+ -| 0 | 3 | 2 | 100 | -| 1 | 2 | 3 | 4 | -| 4 | 3 | 2 | 1 | -| 3 | 4 | 2 | 1 | -+------+------+------+------+ +-- setup +create table tbl( + k1 int, + k2 int, + k3 int, + k4 int +) distributed by hash(k1) buckets 1 +properties ("replication_num"="1"); +insert into tbl values + (0, 3, 2, 100), + (1, 2, 3, 4), + (4, 3, 2, 1), + (3, 4, 2, 1); ``` ```sql @@ -54,3 +57,15 @@ select max_by(k1, k4) from tbl; | 0 | +--------------------+ ``` + +```sql +select max_by(k1, k4) from tbl where k1 is null; +``` + +```text ++----------------+ +| max_by(k1, k4) | ++----------------+ +| NULL | ++----------------+ +``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/median.md b/docs/sql-manual/sql-functions/aggregate-functions/median.md index abcbb99c622..dee74249fb9 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/median.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/median.md @@ -5,9 +5,10 @@ } --- + ## Description -The MEDIAN function returns the median of the expression. +The MEDIAN function returns the median of the expression, equivalent to percentile(expr, 0.5). ## Syntax @@ -17,24 +18,40 @@ MEDIAN(<expr>) ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `<expr>` | The expression needs to be obtained | +| `<expr>` | The expression to calculate, supports types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. | ## Return Value Returns the same data type as the input expression. +Returns NULL if there is no valid data in the group. ## Example +```sql +select datetime, median(scan_rows) from log_statis group by datetime; +``` + +```text +select datetime, median(scan_rows) from log_statis group by datetime; ++---------------------+-------------------+ +| datetime | median(scan_rows) | ++---------------------+-------------------+ +| 2025-08-25 10:00:00 | 50 | +| 2025-08-25 11:00:00 | 30 | ++---------------------+-------------------+ +``` + ```sql select median(scan_rows) from log_statis group by datetime; ``` ```text -+---------------------+ -| median(`scan_rows`) | -+---------------------+ -| 50 | -+---------------------+ +select median(scan_rows) from log_statis where scan_rows is null; ++-------------------+ +| median(scan_rows) | ++-------------------+ +| NULL | ++-------------------+ ``` \ No newline at end of file diff --git a/docs/sql-manual/sql-functions/aggregate-functions/min-by.md b/docs/sql-manual/sql-functions/aggregate-functions/min-by.md index d985c1247d0..3dc76fd1e40 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/min-by.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/min-by.md @@ -5,9 +5,10 @@ } --- + ## Description -The MIN_BY function is used to return the corresponding associated value based on the minimum value of the specified column. +The MIN_BY function returns the associated value based on the minimum value of the specified column. ## Syntax @@ -17,30 +18,32 @@ MIN_BY(<expr1>, <expr2>) ## Parameters -| Parameters | Description | +| Parameter | Description | | -- | -- | -| `<expr1>` | The expression used to specify the corresponding association. | -| `<expr2>` | The expression used to specify the minimum value for statistics. | +| `<expr1>` | The expression for the associated value, supports types: Bool, TinyInt, SmallInt, Int, BigInt, LargeInt, Float, Double, Decimal, String, Date, Datetime. | +| `<expr2>` | The expression for the minimum value, supports types: Bool, TinyInt, SmallInt, Int, BigInt, LargeInt, Float, Double, Decimal, String, Date, Datetime. | ## Return Value -Returns the same data type as the input expression <expr1>. +Returns the same data type as <expr1>. +Returns NULL if there is no valid data in the group. ## Example ```sql -select * from tbl; -``` - -```text -+------+------+------+------+ -| k1 | k2 | k3 | k4 | -+------+------+------+------+ -| 0 | 3 | 2 | 100 | -| 1 | 2 | 3 | 4 | -| 4 | 3 | 2 | 1 | -| 3 | 4 | 2 | 1 | -+------+------+------+------+ +-- setup +create table tbl( + k1 int, + k2 int, + k3 int, + k4 int +) distributed by hash(k1) buckets 1 +properties ("replication_num"="1"); +insert into tbl values + (0, 3, 2, 100), + (1, 2, 3, 4), + (4, 3, 2, 1), + (3, 4, 2, 1); ``` ```sql @@ -54,3 +57,15 @@ select min_by(k1, k4) from tbl; | 4 | +--------------------+ ``` + +```sql +select min_by(k1, k4) from tbl where k1 is null; +``` + +```text ++----------------+ +| min_by(k1, k4) | ++----------------+ +| NULL | ++----------------+ +``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/quantile-union.md b/docs/sql-manual/sql-functions/aggregate-functions/quantile-union.md index efac2ca1dd4..b7901547275 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/quantile-union.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/quantile-union.md @@ -5,9 +5,10 @@ } --- + ## Description -The `QUANTILE_UNION` function is used to merge intermediate results from multiple quantile calculations. This function typically works in conjunction with `QUANTILE_STATE` and is particularly useful in scenarios requiring multi-stage quantile calculations. +The `QUANTILE_UNION` function is used to merge multiple intermediate results of quantile calculations. This function is usually used together with `QUANTILE_STATE`, especially suitable for scenarios requiring multi-stage quantile calculation. ## Syntax @@ -19,16 +20,17 @@ QUANTILE_UNION(<query_state>) | Parameter | Description | | -- | -- | -| `<query_state>` | The intermediate state generated by the `TO_QUANTILE_STATE` function. | +| `<query_state>` | The data to be aggregated, type QuantileState supported. | ## Return Value -Returns an aggregation state that can be used for further quantile calculations. The result of this function remains a `QUANTILE_STATE`. +Returns an aggregation state for further quantile calculation, type QuantileState. +Returns NULL if there is no valid data in the group. ## Example ```sql --- Create sample table +-- setup CREATE TABLE response_times ( request_id INT, response_time DOUBLE, @@ -38,8 +40,6 @@ DISTRIBUTED BY HASH(request_id) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- Insert sample data INSERT INTO response_times VALUES (1, 10.5, 'east'), (2, 15.2, 'east'), @@ -51,8 +51,9 @@ INSERT INTO response_times VALUES (8, 45.9, 'east'), (9, 50.4, 'west'), (10, 100.6, 'east'); +``` --- Calculate 50th percentile of response times by region +```sql SELECT region, QUANTILE_PERCENT( @@ -63,9 +64,10 @@ SELECT ) AS median_response_time FROM response_times GROUP BY region; - ``` +Calculate the 50th percentile of response times by region. + ```text +--------+----------------------+ | region | median_response_time | @@ -74,3 +76,18 @@ GROUP BY region; | east | 30.75 | +--------+----------------------+ ``` + +```sql +SELECT QUANTILE_UNION(TO_QUANTILE_STATE(response_time, 2048)) +FROM response_times where response_time is null; +``` + +Returns NULL if there is no valid data in the group. + +```text ++--------------------------------------------------------+ +| QUANTILE_UNION(TO_QUANTILE_STATE(response_time, 2048)) | ++--------------------------------------------------------+ +| NULL | ++--------------------------------------------------------+ +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md index 5c518f2f9e1..ae2705bbf7a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-raw-agg.md @@ -7,7 +7,7 @@ ## 描述 -HLL_RAW_AGG 函数是一种聚合函数,主要用于将多个 HyperLogLog 数据结构合并成一个 +HLL_RAW_AGG 函数是一种聚合函数,主要用于将多个 HyperLogLog 数据结构合并成一个。 ## 别名 @@ -17,27 +17,58 @@ HLL_RAW_AGG 函数是一种聚合函数,主要用于将多个 HyperLogLog 数 ```sql HLL_RAW_AGG(<hll>) +HLL_UNION(<hll>) ``` ## 参数 | 参数 | 说明 | | -- | -- | -| `<hll>` | 需要被计算HyperLogLog类型表达式 | +| `<hll>` | 需要被计算的表达式,支持类型为 HLL 。 | ## 返回值 返回被聚合后的 HLL 类型。 +如果组内没有合法数据则返回 HLL_EMPTY ; ## 举例 + +```sql +-- setup +create table test_uv( + id int, + uv_set string +) distributed by hash(id) buckets 1 +properties ("replication_num"="1"); +insert into test_uv values + (1, ('a')), + (1, ('b')), + (2, ('c')), + (2, ('d')), + (3, null); +``` + + +```sql +select HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) from test_uv; +``` + +```text ++------------------------------------------------+ +| HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) | ++------------------------------------------------+ +| 4 | ++------------------------------------------------+ +``` + ```sql -select HLL_CARDINALITY(HLL_RAW_AGG(uv_set)) from test_uv; +select HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) from test_uv where uv_set is null; ``` ```text -+------------------------------------------+ -| HLL_CARDINALITY(HLL_RAW_AGG(`uv_set`)) | -+------------------------------------------+ -| 17721 | -+------------------------------------------+ -``` \ No newline at end of file ++------------------------------------------------+ +| HLL_CARDINALITY(HLL_RAW_AGG(hll_hash(uv_set))) | ++------------------------------------------------+ +| 0 | ++------------------------------------------------+ +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md index cf3d639e645..c4d8f9883fe 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/hll-union-agg.md @@ -20,21 +20,53 @@ hll_union_agg(<hll>) | 参数 | 说明 | | -- | -- | -| `<hll>` | 需要被计算 HyperLogLog 类型表达式 | +| `<hll>` | 需要被计算的表达式,支持类型为 HLL 。| ## 返回值 返回 BIGINT 类型的基数值。 +如果组内没有合法数据则返回 0 ; ## 举例 + ```sql -select HLL_UNION_AGG(uv_set) from test_uv; +-- setup +create table test_uv( + id int, + uv_set string +) distributed by hash(id) buckets 1 +properties ("replication_num"="1"); +insert into test_uv values + (1, ('a')), + (1, ('b')), + (2, ('c')), + (2, ('d')), + (3, null); +``` + + +```sql +select HLL_UNION_AGG(HLL_HASH(uv_set)) from test_uv; ``` ```text -+-------------------------+ -| HLL_UNION_AGG(`uv_set`) | -+-------------------------+ -| 17721 | -+-------------------------+ ++---------------------------------+ +| HLL_UNION_AGG(HLL_HASH(uv_set)) | ++---------------------------------+ +| 4 | ++---------------------------------+ ``` + + +```sql +select HLL_UNION_AGG(HLL_HASH(uv_set)) from test_uv where uv_set is null; +``` + +```text ++---------------------------------+ +| HLL_UNION_AGG(HLL_HASH(uv_set)) | ++---------------------------------+ +| 0 | ++---------------------------------+ +``` + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/kurt.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/kurt.md index a55945e3de5..38fff78135a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/kurt.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/kurt.md @@ -17,35 +17,38 @@ KURT_POP,KURTOSIS ```sql KURTOSIS(<expr>) +KURT_POP(<expr>) +KURT(<expr>) ``` ## 参数说明 | 参数 | 说明 | | -- | -- | -| `<expr>` | 需要获取值的表达式 | +| `<expr>` | 需要获取值的表达式,支持类型为 Double 。 | ## 返回值 -返回 DOUBLE 类型的值。特殊情况: - -- 当方差为零时,返回 NULL +返回 DOUBLE 类型的值。 +当方差为零时,返回 NULL 。 +组内没有合法数据时,返回 NULL 。 ## 举例 -```sql -select * from statistic_test; -``` -```text -+-----+------+------+ -| tag | val1 | val2 | -+-----+------+------+ -| 1 | -10 | -10| -| 2 | -20 | NULL| -| 3 | 100 | NULL| -| 4 | 100 | NULL| -| 5 | 1000 | 1000| -+-----+------+------+ +```sql +-- setup +create table statistic_test( + tag int, + val1 double, + val2 double +) distributed by hash(tag) buckets 1 +properties ("replication_num"="1"); +insert into statistic_test values + (1, -10, -10), + (2, -20, null), + (3, 100, null), + (4, 100, null), + (5, 1000, 1000); ``` ```sql @@ -53,15 +56,14 @@ select kurt(val1), kurt(val2) from statistic_test; ``` ```text -+-------------------+--------------------+ -| kurt(val1) | kurt(val2) | -+-------------------+--------------------+ -| 0.162124583734851 | -1.3330994719286338 | -+-------------------+--------------------+ ++---------------------+------------+ +| kurt(val1) | kurt(val2) | ++---------------------+------------+ +| 0.16212458373485106 | -2 | ++---------------------+------------+ ``` ```sql -// 每组只有一行数据,结果为 NULL select kurt(val1), kurt(val2) from statistic_test group by tag; ``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/linear-histogram.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/linear-histogram.md index 95265b7406e..1e6246a756d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/linear-histogram.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/linear-histogram.md @@ -19,8 +19,8 @@ LINEAR_HISTOGRAM 函数用于描述数据分布情况,它使用“等宽”的 | 参数 | 说明 | | -- | -- | -| `interval` | 必须。桶的宽度 | -| `offset` | 可选。默认为 0,范围是 `[0, interval)` | +| `interval` | 桶的宽度,支持类型为 TinyInt,SmallInt,Int,BigtInt,LargeInt,Float,Double,Decimal。 | +| `offset` | 可选。默认为 0,范围是 `[0, interval)`,支持类型为Double。 | ## 返回值 @@ -28,16 +28,27 @@ LINEAR_HISTOGRAM 函数用于描述数据分布情况,它使用“等宽”的 ## 举例 +```sql +-- setup +create table histogram_test( + a int +) distributed by hash(a) buckets 1 +properties ("replication_num"="1"); +insert into histogram_test values + (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (null); +``` + + ```sql select linear_histogram(a, 2) from histogram_test; ``` ```text -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| linear_histogram(a, cast(2 as DOUBLE)) | -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| {"num_buckets":6,"buckets":[{"lower":0.0,"upper":2.0,"count":2,"acc_count":2},{"lower":2.0,"upper":4.0,"count":4,"acc_count":6},{"lower":4.0,"upper":6.0,"count":4,"acc_count":10},{"lower":6.0,"upper":8.0,"count":4,"acc_count":14},{"lower":8.0,"upper":10.0,"count":4,"acc_count":18},{"lower":10.0,"upper":12.0,"count":2,"acc_count":20}]} | -+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| linear_histogram(a, 2) | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| {"num_buckets":6,"buckets":[{"lower":0.0,"upper":2.0,"count":2,"acc_count":2},{"lower":2.0,"upper":4.0,"count":2,"acc_count":4},{"lower":4.0,"upper":6.0,"count":2,"acc_count":6},{"lower":6.0,"upper":8.0,"count":2,"acc_count":8},{"lower":8.0,"upper":10.0,"count":2,"acc_count":10},{"lower":10.0,"upper":12.0,"count":2,"acc_count":12}]} | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` ```sql @@ -45,11 +56,22 @@ select linear_histogram(a, 2, 1) from histogram_test; ``` ```text -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| linear_histogram(a, cast(2 as DOUBLE), cast(1 as DOUBLE)) | -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| {"num_buckets":5,"buckets":[{"lower":1.0,"upper":3.0,"count":4,"acc_count":4},{"lower":3.0,"upper":5.0,"count":4,"acc_count":8},{"lower":5.0,"upper":7.0,"count":4,"acc_count":12},{"lower":7.0,"upper":9.0,"count":4,"acc_count":16},{"lower":9.0,"upper":11.0,"count":4,"acc_count":20}]} | -+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| linear_histogram(a, 2, 1) | ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| {"num_buckets":7,"buckets":[{"lower":-1.0,"upper":1.0,"count":1,"acc_count":1},{"lower":1.0,"upper":3.0,"count":2,"acc_count":3},{"lower":3.0,"upper":5.0,"count":2,"acc_count":5},{"lower":5.0,"upper":7.0,"count":2,"acc_count":7},{"lower":7.0,"upper":9.0,"count":2,"acc_count":9},{"lower":9.0,"upper":11.0,"count":2,"acc_count":11},{"lower":11.0,"upper":13.0,"count":1,"acc_count":12}]} | ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +```sql +select linear_histogram(a, 2, 1) from histogram_test where a is null; +``` + +```text ++--------------------------------+ +| linear_histogram(a, 2, 1) | ++--------------------------------+ +| {"num_buckets":0,"buckets":[]} | ``` ```text diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/max-by.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/max-by.md index 936a30b4fd8..515f0dca0c3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/max-by.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/max-by.md @@ -19,28 +19,30 @@ MAX_BY(<expr1>, <expr2>) | 参数 | 说明 | | -- | -- | -| `<expr1>` | 用于指定对应关联的表达式。 | -| `<expr2>` | 用于指定最大值统计的表达式。 | +| `<expr1>` | 用于指定对应关联的表达式,支持类型为 Bool, TinyInt,SmallInt,Int,BigtInt,LargeInt,Float,Double,Decimal, String, Date, Datetime。 | +| `<expr2>` | 用于指定最大值统计的表达式,支持类型为 Bool, TinyInt,SmallInt,Int,BigtInt,LargeInt,Float,Double,Decimal, String, Date, Datetime。 | ## 返回值 返回与输入表达式 <expr1> 相同的数据类型。 +如果组内没有合法数据,则返回 NULL 。 ## 举例 ```sql -select * from tbl; -``` - -```text -+------+------+------+------+ -| k1 | k2 | k3 | k4 | -+------+------+------+------+ -| 0 | 3 | 2 | 100 | -| 1 | 2 | 3 | 4 | -| 4 | 3 | 2 | 1 | -| 3 | 4 | 2 | 1 | -+------+------+------+------+ +-- setup +create table tbl( + k1 int, + k2 int, + k3 int, + k4 int +) distributed by hash(k1) buckets 1 +properties ("replication_num"="1"); +insert into tbl values + (0, 3, 2, 100), + (1, 2, 3, 4), + (4, 3, 2, 1), + (3, 4, 2, 1); ``` ```sql @@ -54,3 +56,15 @@ select max_by(k1, k4) from tbl; | 0 | +--------------------+ ``` + +```sql +select max_by(k1, k4) from tbl where k1 is null; +``` + +```text ++----------------+ +| max_by(k1, k4) | ++----------------+ +| NULL | ++----------------+ +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/median.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/median.md index 7c67c340164..c7e8d4bf9ec 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/median.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/median.md @@ -7,7 +7,7 @@ ## 描述 -MEDIAN 函数返回表达式的中位数 +MEDIAN 函数返回表达式的中位数,等价于 percentile(expr, 0.5)。 ## 语法 @@ -19,21 +19,56 @@ MEDIAN(<expr>) | 参数 | 说明 | | -- | -- | -| `<expr>` | 需要获取值的表达式 | +| `<expr>` | 需要获取值的表达式,支持类型:Double、Float、LargeInt、BigInt、Int、SmallInt、TinyInt。 | ## 返回值 返回与输入表达式相同的数据类型。 +如果组内没有合法数据,则返回 NULL 。 ## 举例 + +```sql +-- setup +create table log_statis( + datetime datetime, + scan_rows int +) distributed by hash(datetime) buckets 1 +properties ("replication_num"="1"); +insert into log_statis values + ('2025-08-25 10:00:00', 10), + ('2025-08-25 10:00:00', 50), + ('2025-08-25 10:00:00', 100), + ('2025-08-25 11:00:00', 20), + ('2025-08-25 11:00:00', 30), + ('2025-08-25 11:00:00', 40); +``` + +```sql +select datetime,median(scan_rows) from log_statis group by datetime; +``` + +```text +select datetime, median(scan_rows) from log_statis group by datetime; ++---------------------+-------------------+ +| datetime | median(scan_rows) | ++---------------------+-------------------+ +| 2025-08-25 10:00:00 | 50 | +| 2025-08-25 11:00:00 | 30 | ++---------------------+-------------------+ +``` + ```sql select median(scan_rows) from log_statis group by datetime; ``` ```text -+---------------------+ -| median(`scan_rows`) | -+---------------------+ -| 50 | -+---------------------+ -``` \ No newline at end of file +select median(scan_rows) from log_statis where scan_rows is null; ++-------------------+ +| median(scan_rows) | ++-------------------+ +| NULL | ++-------------------+ +``` + + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/min-by.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/min-by.md index e2a17165541..1ec685d51aa 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/min-by.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/min-by.md @@ -19,27 +19,30 @@ MIN_BY(<expr1>, <expr2>) | 参数 | 说明 | | -- | -- | -| `<expr1>` | 用于指定对应关联的表达式。 | -| `<expr2>` | 用于指定最小值统计的表达式。 | +| `<expr1>` | 用于指定对应关联的表达式,支持类型为 Bool, TinyInt,SmallInt,Int,BigtInt,LargeInt,Float,Double,Decimal, String, Date, Datetime。 | +| `<expr2>` | 用于指定最大值统计的表达式,支持类型为 Bool, TinyInt,SmallInt,Int,BigtInt,LargeInt,Float,Double,Decimal, String, Date, Datetime。 | + ## 返回值 返回与输入表达式 <expr1> 相同的数据类型。 +如果组内没有合法数据,则返回 NULL 。 ## 举例 ```sql -select * from tbl; -``` - -```text -+------+------+------+------+ -| k1 | k2 | k3 | k4 | -+------+------+------+------+ -| 0 | 3 | 2 | 100 | -| 1 | 2 | 3 | 4 | -| 4 | 3 | 2 | 1 | -| 3 | 4 | 2 | 1 | -+------+------+------+------+ +-- setup +create table tbl( + k1 int, + k2 int, + k3 int, + k4 int +) distributed by hash(k1) buckets 1 +properties ("replication_num"="1"); +insert into tbl values + (0, 3, 2, 100), + (1, 2, 3, 4), + (4, 3, 2, 1), + (3, 4, 2, 1); ``` ```sql @@ -53,3 +56,15 @@ select min_by(k1, k4) from tbl; | 4 | +--------------------+ ``` + +```sql +select min_by(k1, k4) from tbl where k1 is null; +``` + +```text ++----------------+ +| min_by(k1, k4) | ++----------------+ +| NULL | ++----------------+ +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/quantile-union.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/quantile-union.md index 6bce528c889..c53b858e596 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/quantile-union.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/quantile-union.md @@ -19,16 +19,17 @@ QUANTILE_UNION(<query_state>) | 参数 | 说明 | | -- | -- | -| `<query_state>` | 由 `TO_QUANTILE_STATE` 函数生成的中间状态 | +| `<query_state>` | 需要聚合的数据,支持类型为 QuantileState。 | ## 返回值 -返回一个可以用于进一步分位数计算的聚合状态。此函数返回的结果仍是 `QUANTILE_STATE`。 +返回一个可以用于进一步分位数计算的聚合状态,类型为 QuantileState 。 +组内没有合法数据时返回 NULL。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE response_times ( request_id INT, response_time DOUBLE, @@ -38,8 +39,6 @@ DISTRIBUTED BY HASH(request_id) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- 插入示例数据 INSERT INTO response_times VALUES (1, 10.5, 'east'), (2, 15.2, 'east'), @@ -51,8 +50,9 @@ INSERT INTO response_times VALUES (8, 45.9, 'east'), (9, 50.4, 'west'), (10, 100.6, 'east'); +``` --- 按区域计算响应时间的 50% 分位数 +```sql SELECT region, QUANTILE_PERCENT( @@ -63,9 +63,10 @@ SELECT ) AS median_response_time FROM response_times GROUP BY region; - ``` +按区域计算响应时间的 50% 分位数。 + ```text +--------+----------------------+ | region | median_response_time | @@ -74,3 +75,18 @@ GROUP BY region; | east | 30.75 | +--------+----------------------+ ``` + +```sql +SELECT QUANTILE_UNION(TO_QUANTILE_STATE(response_time, 2048)) +FROM response_times where response_time is null; +``` + +组内没有合法数据时返回 NULL。 + +```text ++--------------------------------------------------------+ +| QUANTILE_UNION(TO_QUANTILE_STATE(response_time, 2048)) | ++--------------------------------------------------------+ +| NULL | ++--------------------------------------------------------+ +``` --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
