This is an automated email from the ASF dual-hosted git repository. panxiaolei pushed a commit to branch dev_0818 in repository https://gitbox.apache.org/repos/asf/doris-website.git
commit ef0f356622d90a8c048836e535f67227473bd636 Author: BiteTheDDDDt <[email protected]> AuthorDate: Mon Aug 18 17:54:14 2025 +0800 update doc of percentile series/regr series --- ...x_weighted.md => percentile-approx-weighted.md} | 36 +++++++++++-- .../aggregate-functions/percentile-approx.md | 43 ++++++++++++--- .../aggregate-functions/percentile-array.md | 55 ++++++++++++++----- .../aggregate-functions/percentile.md | 45 +++++++++++++--- .../aggregate-functions/regr-intercept.md | 32 +++++++---- .../aggregate-functions/regr-slope.md | 39 ++++++++++---- .../aggregate-functions/regr_intercept.md | 62 ---------------------- .../aggregate-functions/regr_slope.md | 62 ---------------------- ...x_weighted.md => percentile-approx-weighted.md} | 46 ++++++++++++---- .../aggregate-functions/percentile-approx.md | 43 ++++++++++++--- .../aggregate-functions/percentile-array.md | 54 ++++++++++++++----- .../aggregate-functions/percentile.md | 45 +++++++++++++--- .../aggregate-functions/regr-intercept.md | 30 ++++++++--- .../aggregate-functions/regr-slope.md | 38 +++++++++---- .../aggregate-functions/regr_intercept.md | 62 ---------------------- .../aggregate-functions/regr_slope.md | 62 ---------------------- sidebars.json | 4 +- 17 files changed, 407 insertions(+), 351 deletions(-) diff --git a/docs/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md similarity index 63% rename from docs/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md rename to docs/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md index 00c63988968..843d43e119e 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md @@ -24,14 +24,15 @@ PERCENTILE_APPROX_WEIGHTED(<col>, <weight>, <p> [, <compression>]) | Parameter | Description | | -- | -- | -| `<col>` | The column to calculate the percentile for | -| `<weight>` | Weight column, must be positive numbers | +| `<col>` | The column to calculate the percentile for. Supported type: Double | +| `<weight>` | Weight column, must be positive numbers. Supported type: Double | | `<p>` | Percentile value, range `[0.0, 1.0]`, e.g., `0.99` represents the `99`th percentile | -| `<compression>` | Optional parameter, compression ratio, range `[2048, 10000]`. The higher the value, the higher the precision, but the greater the memory consumption. If not specified or outside the range, use `10000`. | +| `<compression>` | Optional parameter, supported type: Double. Compression ratio, range `[2048, 10000]`. The higher the value, the higher the precision, but the greater the memory consumption. If not specified or outside the range, `10000` is used. | ## Return Value Return a `DOUBLE` type value, representing the calculated weighted approximate percentile. +If there is no valid data in the group, returns `NULL`. ## Examples @@ -78,3 +79,32 @@ FROM weighted_scores; ``` +```sql +select percentile_approx_weighted(if(score>95,score,null), weight, 0.9) from weighted_scores; +``` + +This will only consider non-NULL inputs. + +```text ++------------------------------------------------------------------+ +| percentile_approx_weighted(if(score>95,score,null), weight, 0.9) | ++------------------------------------------------------------------+ +| 95.5 | ++------------------------------------------------------------------+ +``` + +```sql +select percentile_approx_weighted(score, weight, 0.9, null) from weighted_scores; +``` + +If all input values are NULL, the function returns NULL. + +```text ++------------------------------------------------------+ +| percentile_approx_weighted(score, weight, 0.9, null) | ++------------------------------------------------------+ +| NULL | ++------------------------------------------------------+ +``` + + diff --git a/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md index 4d6c906cb4a..737ed1b5e78 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md @@ -24,18 +24,19 @@ PERCENTILE_APPROX(<col>, <p> [, <compression>]) | Parameter | Description | | -- | -- | -| `<col>` | The column to calculate percentiles for | -| `<p>` | Percentile value, range `[0.0, 1.0]`, e.g., `0.99` represents the `99th` percentile | -| `<compression>` | Optional parameter, compression level, range `[2048, 10000]`, higher values increase precision but consume more memory. If not specified or out of range, uses `10000` | +| `<col>` | The column to calculate percentiles for, supported type: Double. | +| `<p>` | Percentile value, constant, type Double, range `[0.0, 1.0]`, e.g., `0.99` means 99th percentile. Must be a constant. | +| `<compression>` | Optional, compression level, type Double, range `[2048, 10000]`. Higher values increase precision but consume more memory. If not specified or out of range, defaults to `10000`. | ## Return Value -Returns a `DOUBLE` value representing the calculated approximate percentile. +Returns the approximate percentile of the specified column, type Double. +If there is no valid data in the group, returns NULL. ## Examples ```sql --- Create sample table +-- setup CREATE TABLE response_times ( request_id INT, response_time DECIMAL(10, 2) @@ -44,8 +45,6 @@ DISTRIBUTED BY HASH(`request_id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- Insert sample data INSERT INTO response_times VALUES (1, 10.5), (2, 15.2), @@ -57,7 +56,9 @@ INSERT INTO response_times VALUES (8, 45.9), (9, 50.4), (10, 100.6); +``` +```sql -- Calculate 99th percentile using different compression levels SELECT percentile_approx(response_time, 0.99) as p99_default, -- Default compression @@ -74,4 +75,32 @@ FROM response_times; +-------------------+-------------------+-------------------+ ``` +```sql +SELECT percentile_approx(if(response_time>90,response_time,NULL), 0.5) FROM response_times; +``` + +Only non-NULL data is calculated. + +```text ++-----------------------------------------------------------------+ +| percentile_approx(if(response_time>90,response_time,NULL), 0.5) | ++-----------------------------------------------------------------+ +| 100.5999984741211 | ++-----------------------------------------------------------------+ +``` + +```sql +SELECT percentile_approx(NULL, 0.99) FROM response_times; +``` + +Returns NULL when all input data is NULL. + +```text ++-------------------------------+ +| percentile_approx(NULL, 0.99) | ++-------------------------------+ +| NULL | ++-------------------------------+ +``` + diff --git a/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md b/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md index db333cb783a..16754503c3a 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md @@ -25,17 +25,19 @@ PERCENTILE_ARRAY(<col>, <array_p>) | Parameter | Description | | -- | -- | -| `<col>` | The column to calculate the percentile for | -| `<array_p>` | Percentile array, each element must be in the range `[0.0, 1.0]`, e.g., `[0.5, 0.95, 0.99]` | +| `<col>` | The column to calculate the exact percentiles for. Supported types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. | +| `<array_p>` | Percentile array, each element must be a constant of type Array<Double>, with values in the range `[0.0, 1.0]`, e.g., `[0.5, 0.95, 0.99]`. | ## Return Value -Return a `DOUBLE` type array, containing the calculated percentile values. +Returns a DOUBLE type array containing the calculated percentile values. +If there is no valid data in the group, returns an empty array. + ## Examples ```sql --- Create sample table +-- setup CREATE TABLE sales_data ( id INT, amount DECIMAL(10, 2) @@ -44,8 +46,6 @@ DISTRIBUTED BY HASH(`id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- Insert sample data INSERT INTO sales_data VALUES (1, 10.5), (2, 15.2), @@ -57,16 +57,47 @@ INSERT INTO sales_data VALUES (8, 45.9), (9, 50.4), (10, 100.6); +``` --- Calculate multiple percentiles +```sql SELECT percentile_array(amount, [0.25, 0.5, 0.75, 0.9]) as percentiles FROM sales_data; ``` +Calculate multiple percentiles. + +```text ++-----------------------------------------------------+ +| percentiles | ++-----------------------------------------------------+ +| [21.525000000000002, 33, 44.475, 55.41999999999998] | ++-----------------------------------------------------+ +``` + +```sql +SELECT percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) FROM sales_data; +``` + +Only non-NULL data is calculated. + +```text ++------------------------------------------------------------+ +| percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) | ++------------------------------------------------------------+ +| [100.6, 100.6] | ++------------------------------------------------------------+ +``` + +```sql +SELECT percentile_array(NULL, [0.5, 0.99]) FROM sales_data; +``` + +Returns an empty array when all input data is NULL. + ```text -+-----------------------------------------+ -| percentiles | -+-----------------------------------------+ -| [21.25, 32.5, 43.75, 54.99999999999998] | -+-----------------------------------------+ ++-------------------------------------+ +| percentile_array(NULL, [0.5, 0.99]) | ++-------------------------------------+ +| [] | ++-------------------------------------+ ``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/percentile.md b/docs/sql-manual/sql-functions/aggregate-functions/percentile.md index 2eae37c9dcb..af13d71287e 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/percentile.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile.md @@ -8,7 +8,6 @@ ## Description Calculates the exact percentile, suitable for small datasets. First sorts the specified column in descending order, then takes the exact p-th percentile. The value of `p` is between `0` and `1`. If `p` does not point to an exact position, it returns the [linear interpolation](https://en.wikipedia.org/wiki/Linear_interpolation) of the adjacent values at position `p`. Note that this is not the average of the two numbers. Special cases: -- Returns `NULL` when the input column is `NULL` ## Syntax @@ -20,17 +19,18 @@ PERCENTILE(<col>, <p>) | Parameter | Description | | -- | -- | -| `<col>` | The column to be calculated as the exact percentile, which must be an integer column. | -| `<p>` | The exact percentile to be calculated, a constant value, with a value range of `[0.0, 1.0]`. | +| `<col>` | The column to calculate the exact percentile for. Supported types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. | +| `<p>` | The exact percentile to be calculated, a constant value. Supported type: Double. Range: `[0.0, 1.0]`. The second parameter must be a constant. | ## Return Value -Return the exact percentile of the specified column, with a return type of `DOUBLE`. +Return the exact percentile of the specified column, with a return type of Double. +If there is no valid data in the group, returns NULL. ## Examples ```sql --- Create sample table +-- Setup CREATE TABLE sales_data ( product_id INT, @@ -40,8 +40,6 @@ DISTRIBUTED BY HASH(`product_id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- Insert sample data INSERT INTO sales_data VALUES (1, 10.00), (1, 15.00), @@ -53,8 +51,9 @@ INSERT INTO sales_data VALUES (1, 45.00), (1, 50.00), (1, 100.00); +``` --- Calculate different percentiles of sales prices +```sql SELECT percentile(sale_price, 0.5) as median_price, -- Median percentile(sale_price, 0.75) as p75_price, -- 75th percentile @@ -64,6 +63,8 @@ SELECT FROM sales_data; ``` +Calculates sale prices at different percentiles. + ```text +--------------+-----------+-------------------+-------------------+----------+ | median_price | p75_price | p90_price | p95_price | p99_null | @@ -71,3 +72,31 @@ FROM sales_data; | 32.5 | 43.75 | 54.99999999999998 | 77.49999999999994 | NULL | +--------------+-----------+-------------------+-------------------+----------+ ``` + +```sql +select percentile(if(sale_price>90,sale_price,NULL), 0.5) from sales_data; +``` + +Only non-NULL input values are considered in the calculation. + +```text ++----------------------------------------------------+ +| percentile(if(sale_price>90,sale_price,NULL), 0.5) | ++----------------------------------------------------+ +| 100 | ++----------------------------------------------------+ +``` + +```sql +select percentile(sale_price, NULL) from sales_data; +``` + +If all input values are NULL, returns NULL. + +```text ++------------------------------+ +| percentile(sale_price, NULL) | ++------------------------------+ +| NULL | ++------------------------------+ +``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md b/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md index 94fb3f93a41..601b7fe5a8e 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md @@ -23,12 +23,12 @@ REGR_INTERCEPT(<y>, <x>) | Parameter | Description | | -- | -- | -| `<y>` | The dependent variable. This must be an expression that can be evaluated to a numeric type. | -| `<x>` | The independent variable. This must be an expression that can be evaluated to a numeric type. | +| `<y>` | The dependent variable. Supported type: Double. | +| `<x>` | The independent variable. Supported type: Double. | ## Return Value -Return a `DOUBLE` value, representing the intercept of the univariate linear regression line for non-null pairs in a group. +Returns a Double value representing the intercept of the univariate linear regression line for non-null pairs in a group. If there are no rows, or only rows that contain nulls, the function returns NULL. ## Examples @@ -57,9 +57,23 @@ SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept; ``` ```text -+-------------------------+ -| regr_intercept(y, x) | -+-------------------------+ -| 5.512931034482759 | -+-------------------------+ -``` \ No newline at end of file ++----------------------+ +| REGR_INTERCEPT(y, x) | ++----------------------+ +| 5.512931034482759 | ++----------------------+ +``` + +```sql +SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept where x>100; +``` + +When there are no rows in the group, the function returns `NULL`. + +```text ++----------------------+ +| REGR_INTERCEPT(y, x) | ++----------------------+ +| NULL | ++----------------------+ +``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md b/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md index 0b8a78cc6b7..80ceb0a61c7 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md @@ -20,17 +20,18 @@ REGR_SLOPE(<y>, <x>) | Parameter | Description | | -- | -- | -| `<y>` | The dependent variable. This must be an expression that can be evaluated to a numeric type. | -| `<x>` | The independent variable. This must be an expression that can be evaluated to a numeric type. | +| `<y>` | The dependent variable. Supported type: Double. | +| `<x>` | The independent variable. Supported type: Double. | ## Return Value -Returns a `DOUBLE` value representing the slope of the linear regression line. +Returns a Double value representing the slope of the linear regression line. +If there are no rows in the group, or all rows contain NULLs for the expressions, the function returns `NULL`. ## Examples ```sql --- Create example table +-- setup CREATE TABLE test_regr_slope ( `id` int, `x` int, @@ -48,15 +49,33 @@ INSERT INTO test_regr_slope VALUES (3, 12, 2), (4, 5, 6), (5, 10, 20); +``` --- Calculate the linear regression slope of x and y +```sql SELECT REGR_SLOPE(y, x) FROM test_regr_slope; ``` +Calculate the linear regression slope of x and y. + + +```text ++--------------------+ +| REGR_SLOPE(y, x) | ++--------------------+ +| 0.6853448275862069 | ++--------------------+ +``` + +```sql +SELECT REGR_SLOPE(y, x) FROM test_regr_slope where x>100; +``` + +When there are no rows in the group, the function returns `NULL`. + ```text -+----------------------+ -| regr_slope(y, x) | -+----------------------+ -| 0.6853448275862069 | -+----------------------+ ++------------------+ +| REGR_SLOPE(y, x) | ++------------------+ +| NULL | ++------------------+ ``` diff --git a/docs/sql-manual/sql-functions/aggregate-functions/regr_intercept.md b/docs/sql-manual/sql-functions/aggregate-functions/regr_intercept.md deleted file mode 100644 index e1e9d69bbd3..00000000000 --- a/docs/sql-manual/sql-functions/aggregate-functions/regr_intercept.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -{ - "title": "REGR_INTERCEPT", - "language": "en" -} ---- - -## Description -REGR_INTERCEPT is used to calculate the y-intercept of the least squares-fit linear equation for a set of number pairs. - -## Syntax -``` -REGR_INTERCEPT(y, x) -``` - -## Parameters -- `y` (Numeric): The dependent variable. -- `x` (Numeric): The independent variable. - -Both `x` and `y` support basic numeric types. - -## Returned values -Returned data type: FLOAT64 - -The function returns the y-intercept of the linear regression line. - -If there are no rows, or only rows with null values, the function returns NULL. - -## Examples -```sql --- Example 1: Basic Usage -SELECT regr_intercept(y, x) FROM test; - --- Example 2: Usage in a query with sample data -SELECT * FROM test; -+------+------+------+ -| id | x | y | -+------+------+------+ -| 1 | 18 | 13 | -| 3 | 12 | 2 | -| 5 | 10 | 20 | -| 2 | 14 | 27 | -| 4 | 5 | 6 | -+------+------+------+ - -SELECT regr_intercept(y, x) FROM test; -+----------------------+ -| regr_intercept(y, x) | -+----------------------+ -| 5.512931034482759 | -+----------------------+ -``` - -## Usage notes -- This function ignores any pair where either value is null. -- In cases where the calculation would result in a division by zero, the function will return NULL. - -## Related functions -REGR_SLOPE, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY - -## References -For more details about linear regression functions, please refer to the SQL standard documentation on aggregate functions. diff --git a/docs/sql-manual/sql-functions/aggregate-functions/regr_slope.md b/docs/sql-manual/sql-functions/aggregate-functions/regr_slope.md deleted file mode 100644 index 85a090be310..00000000000 --- a/docs/sql-manual/sql-functions/aggregate-functions/regr_slope.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -{ - "title": "REGR_SLOPE", - "language": "en" -} ---- - -## Description -REGR_SLOPE is used to calculate the slope of the least squares-fit linear equation for a set of number pairs. - -## Syntax -``` -REGR_SLOPE(y, x) -``` - -## Parameters -- `y` (Numeric): The dependent variable. -- `x` (Numeric): The independent variable. - -Both `x` and `y` support basic numeric types. - -## Returned values -Returned data type: FLOAT64 - -The function returns the slope of the linear regression line. - -If there are no rows, or only rows with null values, the function returns NULL. - -## Examples -```sql --- Example 1: Basic Usage -SELECT regr_slope(y, x) FROM test; - --- Example 2: Usage in a query with sample data -SELECT * FROM test; -+------+------+------+ -| id | x | y | -+------+------+------+ -| 1 | 18 | 13 | -| 3 | 12 | 2 | -| 5 | 10 | 20 | -| 2 | 14 | 27 | -| 4 | 5 | 6 | -+------+------+------+ - -SELECT regr_slope(y, x) FROM test; -+--------------------+ -| regr_slope(y, x) | -+--------------------+ -| 0.6853448275862069 | -+--------------------+ -``` - -## Usage notes -- This function ignores any pair where either value is null. -- In cases where the calculation would result in a division by zero, the function will return NULL. - -## Related functions -REGR_INTERCEPT, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY - -## References -For more details about linear regression functions, please refer to the SQL standard documentation on aggregate functions. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md similarity index 54% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md index 6645967819b..c6fcdb2730a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md @@ -24,19 +24,20 @@ PERCENTILE_APPROX_WEIGHTED(<col>, <weight>, <p> [, <compression>]) | 参数 | 说明 | | -- | -- | -| `<col>` | 需要计算百分位数的列 | -| `<weight>` | 权重列,必须是正数 | -| `<p>` | 百分位数值,取值范围 `[0.0, 1.0]`,例如 `0.99` 表示 `99` 分位数 | -| `<compression>` | 可选参数,压缩度,取值范围 `[2048, 10000]`。值越大,精度越高,但内存消耗也越大。如果不指定或超出范围,则使用 `10000`。 | +| `<col>` | 需要计算百分位数的列,支持类型为 Double 。 | +| `<weight>` | 权重列,必须是正数,支持类型为 Double 。 | +| `<p>` | 百分位数值,支持类型为 Double 。取值范围 `[0.0, 1.0]`,例如 `0.99` 表示 `99` 分位数 | +| `<compression>` | 可选参数,支持类型为 Double 。表示压缩度,取值范围 `[2048, 10000]`。值越大,精度越高,但内存消耗也越大。如果不指定或超出范围,则使用 `10000`。 | ## 返回值 -返回一个 `DOUBLE` 类型的值,表示计算得到的加权近似百分位数。 +返回一个 Double 类型的值,表示计算得到的加权近似百分位数。 +如果组内没有合法数据,则返回 NULL 。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE weighted_scores ( student_id INT, score DECIMAL(10, 2), @@ -46,8 +47,6 @@ DISTRIBUTED BY HASH(student_id) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- 插入示例数据 INSERT INTO weighted_scores VALUES (1, 85.5, 1), -- 普通作业分数,权重 1 (2, 90.0, 2), -- 重要作业分数,权重 2 @@ -59,8 +58,9 @@ INSERT INTO weighted_scores VALUES (8, 89.5, 2), (9, 94.0, 3), (10, 83.5, 1); +``` --- 计算带权重的分数分布 +```sql SELECT -- 计算不同压缩度下的 90 分位数 percentile_approx_weighted(score, weight, 0.9) as p90_default, -- 默认压缩度 @@ -69,6 +69,8 @@ SELECT FROM weighted_scores; ``` +计算带权重的分数分布。 + ```text +------------------+------------------+------------------+ | p90_default | p90_fast | p90_accurate | @@ -77,4 +79,30 @@ FROM weighted_scores; +------------------+------------------+------------------+ ``` +```sql +select percentile_approx_weighted(if(score>95,score,null), weight, 0.9) from weighted_scores; +``` +只会计算输入的非 NULL 的数据。 + +```text ++------------------------------------------------------------------+ +| percentile_approx_weighted(if(score>95,score,null), weight, 0.9) | ++------------------------------------------------------------------+ +| 95.5 | ++------------------------------------------------------------------+ +``` + +```sql +select percentile_approx_weighted(score, weight, 0.9, null) from weighted_scores; +``` + +如果输入数据均为 NULL,则返回NULL。 + +```text ++------------------------------------------------------+ +| percentile_approx_weighted(score, weight, 0.9, null) | ++------------------------------------------------------+ +| NULL | ++------------------------------------------------------+ +``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md index a293e43e4a7..5bcf24501f1 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md @@ -23,18 +23,19 @@ PERCENTILE_APPROX(<col>, <p> [, <compression>]) | 参数 | 说明 | | -- | -- | -| `<col>` | 需要计算百分位数的列 | -| `<p>` | 百分位数值,取值范围 `[0.0, 1.0]`,例如 `0.99` 表示 `99` 分位数 | -| `<compression>` | 可选参数,压缩度,取值范围 `[2048, 10000]`。值越大,精度越高,但内存消耗也越大。如果不指定或超出范围,则使用 `10000`。 | +| `<col>` | 需要计算百分位数的列,支持类型:Double。 | +| `<p>` | 百分位数,常量,支持类型为 Double,取值范围 `[0.0, 1.0]`,如 `0.99` 表示 99 分位。必须为常量。 | +| `<compression>` | 可选,压缩度,支持类型为 Double,取值范围 `[2048, 10000]`。值越大精度越高但内存消耗也越大。未指定或超出范围时默认 `10000`。 | ## 返回值 -返回一个 `DOUBLE` 类型的值,表示计算得到的近似百分位数。 +返回指定列的近似百分位数,类型为 Double。 +如果组内没有合法数据,则返回 NULL。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE response_times ( request_id INT, response_time DECIMAL(10, 2) @@ -43,8 +44,6 @@ DISTRIBUTED BY HASH(`request_id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- 插入示例数据 INSERT INTO response_times VALUES (1, 10.5), (2, 15.2), @@ -56,7 +55,9 @@ INSERT INTO response_times VALUES (8, 45.9), (9, 50.4), (10, 100.6); +``` +```sql -- 使用不同压缩度计算 99 分位数 SELECT percentile_approx(response_time, 0.99) as p99_default, -- 默认压缩度 @@ -73,4 +74,32 @@ FROM response_times; +-------------------+-------------------+-------------------+ ``` +```sql +SELECT percentile_approx(if(response_time>90,response_time,NULL), 0.5) FROM response_times; +``` + +只计算非 NULL 数据。 + +```text ++-----------------------------------------------------------------+ +| percentile_approx(if(response_time>90,response_time,NULL), 0.5) | ++-----------------------------------------------------------------+ +| 100.5999984741211 | ++-----------------------------------------------------------------+ +``` + +```sql +SELECT percentile_approx(NULL, 0.99) FROM response_times; +``` + +输入数据均为 NULL 时返回 NULL。 + +```text ++-------------------------------+ +| percentile_approx(NULL, 0.99) | ++-------------------------------+ +| NULL | ++-------------------------------+ +``` + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md index 992b5fad561..1f806f23c91 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md @@ -24,17 +24,18 @@ PERCENTILE_ARRAY(<col>, <array_p>) | 参数 | 说明 | | -- | -- | -| `<col>` | 需要计算百分位数的列 | -| `<array_p>` | 百分位数数组,数组中的每个元素必须在 `[0.0, 1.0]` 范围内,例如 `[0.5, 0.95, 0.99]` | +| `<col>` | 需要被计算精确百分位数的列,支持类型:Double、Float、LargeInt、BigInt、Int、SmallInt、TinyInt。 | +| `<array_p>` | 百分位数数组,数组中的每个元素必须为常量,类型为 Array<Double>,取值范围为 `[0.0, 1.0]`,如 `[0.5, 0.95, 0.99]`。 | ## 返回值 -返回一个 `DOUBLE` 类型的数组,包含了对应于输入百分位数数组的计算结果。 +返回一个 DOUBLE 类型的数组,包含了对应于输入百分位数数组的计算结果。 +如果组内没有合法数据,则返回空数组。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE sales_data ( id INT, amount DECIMAL(10, 2) @@ -43,8 +44,6 @@ DISTRIBUTED BY HASH(`id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- 插入示例数据 INSERT INTO sales_data VALUES (1, 10.5), (2, 15.2), @@ -56,16 +55,47 @@ INSERT INTO sales_data VALUES (8, 45.9), (9, 50.4), (10, 100.6); +``` --- 计算多个百分位数 +```sql SELECT percentile_array(amount, [0.25, 0.5, 0.75, 0.9]) as percentiles FROM sales_data; ``` +计算多个百分位数。 + +```text ++-----------------------------------------------------+ +| percentiles | ++-----------------------------------------------------+ +| [21.525000000000002, 33, 44.475, 55.41999999999998] | ++-----------------------------------------------------+ +``` + +```sql +SELECT percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) FROM sales_data; +``` + +只计算非 NULL 数据。 + +```text ++------------------------------------------------------------+ +| percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) | ++------------------------------------------------------------+ +| [100.6, 100.6] | ++------------------------------------------------------------+ +``` + +```sql +SELECT percentile_array(NULL, [0.5, 0.99]) FROM sales_data; +``` + +输入数据均为 NULL 时返回空数组。 + ```text -+-----------------------------------------+ -| percentiles | -+-----------------------------------------+ -| [21.25, 32.5, 43.75, 54.99999999999998] | -+-----------------------------------------+ ++-------------------------------------+ +| percentile_array(NULL, [0.5, 0.99]) | ++-------------------------------------+ +| [] | ++-------------------------------------+ ``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md index 69510333e43..a6d9c33cd02 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md @@ -8,7 +8,6 @@ ## 描述 计算精确的百分位数,适用于小数据量。先对指定列降序排列,然后取精确的第 `p` 位百分数。`p` 的值介于 `0` 到 `1` 之间,如果 `p` 不指向精确的位置,则返回所指位置两侧相邻数值在 `p` 处的[线性插值](https://zh.wikipedia.org/wiki/%E7%BA%BF%E6%80%A7%E6%8F%92%E5%80%BC),注意这不是两数字的平均数。特殊情况: -- 当输入的列为 `NULL` 时,返回 `NULL` ## 语法 @@ -20,17 +19,18 @@ PERCENTILE(<col>, <p>) | 参数 | 说明 | | -- | -- | -| `<col>` | 需要被计算精确的百分位数的列,必须是整数类型的列。 | -| `<p>` | 需要精确的百分位数,常量,取值为 `[0.0, 1.0]`。 | +| `<col>` | 需要被计算精确的百分位数的列,支持类型:Double、Float、LargeInt、BigInt、Int、SmallInt、TinyInt。 | +| `<p>` | 需要精确的百分位数,常量,支持类型:Double,取值范围为 `[0.0, 1.0]`。并且要求为常量(非运行时列)。 | ## 返回值 -返回指定列的精确的百分位数,类型为 `DOUBLE`。 +返回指定列的精确的百分位数,类型为 Double。 +如果组内没有合法数据,则返回 NULL 。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE sales_data ( product_id INT, @@ -40,8 +40,6 @@ DISTRIBUTED BY HASH(`product_id`) BUCKETS AUTO PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - --- 插入示例数据 INSERT INTO sales_data VALUES (1, 10.00), (1, 15.00), @@ -53,8 +51,9 @@ INSERT INTO sales_data VALUES (1, 45.00), (1, 50.00), (1, 100.00); +``` --- 计算不同百分位的销售价格 +```sql SELECT percentile(sale_price, 0.5) as median_price, -- 中位数 percentile(sale_price, 0.75) as p75_price, -- 75 分位数 @@ -64,6 +63,8 @@ SELECT FROM sales_data; ``` +计算不同百分位的销售价格。 + ```text +--------------+-----------+-------------------+-------------------+----------+ | median_price | p75_price | p90_price | p95_price | p99_null | @@ -71,3 +72,31 @@ FROM sales_data; | 32.5 | 43.75 | 54.99999999999998 | 77.49999999999994 | NULL | +--------------+-----------+-------------------+-------------------+----------+ ``` + +```sql +select percentile(if(sale_price>90,sale_price,NULL), 0.5) from sales_data; +``` + +只会计算输入的非 NULL 的数据。 + +```text ++----------------------------------------------------+ +| percentile(if(sale_price>90,sale_price,NULL), 0.5) | ++----------------------------------------------------+ +| 100 | ++----------------------------------------------------+ +``` + +```sql +select percentile(sale_price, NULL) from sales_data; +``` + +如果输入数据均为 NULL,则返回NULL。 + +```text ++------------------------------+ +| percentile(sale_price, NULL) | ++------------------------------+ +| NULL | ++------------------------------+ +``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md index 44d030a2a66..00e5d773cfb 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md @@ -23,17 +23,18 @@ REGR_INTERCEPT(<y>, <x>) | 参数 | 说明 | | -- | -- | -| `<y>` | 因变量,必须是可以计算为数值类型的表达式。 | -| `<x>` | 自变量,必须是可以计算为数值类型的表达式。 | +| `<y>` | 因变量,支持类型为 Double。 | +| `<x>` | 自变量,支持类型为 Double。 | ## 返回值 -返回 `DOUBLE` 类型的值,表示线性回归线与 `y` 轴的交点。 +返回 Double 类型的值,表示线性回归线与 `y` 轴的交点。 +如果没有行,或者只有包含空值的行,函数返回 NULL。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE test_regr_intercept ( `id` int, `x` int, @@ -51,15 +52,32 @@ INSERT INTO test_regr_intercept VALUES (3, 12, 2), (4, 5, 6), (5, 10, 20); +``` --- 计算 x 和 y 的线性回归截距 +```sql SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept; ``` +计算 x 和 y 的线性回归截距。 + ```text +-------------------------+ | regr_intercept(y, x) | +-------------------------+ | 5.512931034482759 | +-------------------------+ -``` \ No newline at end of file +``` + +```sql +SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept where x>100; +``` + +组内没有数据时,返回 NULL 。 + +```text ++----------------------+ +| REGR_INTERCEPT(y, x) | ++----------------------+ +| NULL | ++----------------------+ +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md index c2fc4c89830..8af7b025cb4 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md @@ -20,17 +20,18 @@ REGR_SLOPE(<y>, <x>) | 参数 | 说明 | | -- | -- | -| `<y>` | 因变量,必须是可以计算为数值类型的表达式。 | -| `<x>` | 自变量,必须是可以计算为数值类型的表达式。 | +| `<y>` | 因变量,支持类型为 Double。 | +| `<x>` | 自变量,支持类型为 Double。 | ## 返回值 -返回 DOUBLE 类型的值,表示线性回归线的斜率。 +返回 Double 类型的值,表示线性回归线的斜率。 +如果没有行,或者只有包含空值的行,函数返回 NULL。 ## 举例 ```sql --- 创建示例表 +-- setup CREATE TABLE test_regr_slope ( `id` int, `x` int, @@ -48,15 +49,32 @@ INSERT INTO test_regr_slope VALUES (3, 12, 2), (4, 5, 6), (5, 10, 20); +``` --- 计算 x 和 y 的线性回归斜率 +```sql SELECT REGR_SLOPE(y, x) FROM test_regr_slope; ``` +计算 x 和 y 的线性回归截距。 + +```text ++--------------------+ +| REGR_SLOPE(y, x) | ++--------------------+ +| 0.6853448275862069 | ++--------------------+ +``` + +```sql +SELECT REGR_SLOPE(y, x) FROM test_regr_slope where x>100; +``` + +组内没有数据时,返回 NULL 。 + ```text -+----------------------+ -| regr_slope(y, x) | -+----------------------+ -| 0.6853448275862069 | -+----------------------+ ++------------------+ +| REGR_SLOPE(y, x) | ++------------------+ +| NULL | ++------------------+ ``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_intercept.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_intercept.md deleted file mode 100644 index a619daac246..00000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_intercept.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -{ - "title": "REGR_INTERCEPT", - "language": "zh-CN" -} ---- - -## Description -REGR_INTERCEPT 用于计算一组数值对的最小二乘拟合线性方程的截距。 - -## Syntax -``` -REGR_INTERCEPT(y, x) -``` - -## Parameters -- `y` (数值类型):因变量。 -- `x` (数值类型):自变量。 - -x 和 y 都支持基本数值类型。 - -## Returned values -返回数据类型:FLOAT64 - -函数返回线性回归直线的截距。 - -如果没有行,或者只有包含空值的行,函数返回 NULL。 - -## Examples -```sql --- 示例 1:基本用法 -SELECT regr_intercept(y, x) FROM test; - --- 示例 2:在查询中使用示例数据 -SELECT * FROM test; -+------+------+------+ -| id | x | y | -+------+------+------+ -| 1 | 18 | 13 | -| 3 | 12 | 2 | -| 5 | 10 | 20 | -| 2 | 14 | 27 | -| 4 | 5 | 6 | -+------+------+------+ - -SELECT regr_intercept(y, x) FROM test; -+----------------------+ -| regr_intercept(y, x) | -+----------------------+ -| 5.512931034482759 | -+----------------------+ -``` - -## Usage notes -- 此函数会忽略任何包含空值的数值对。 -- 在计算结果会导致除以零的情况下,函数将返回 NULL。 - -## Related functions -REGR_SLOPE, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY - -## References -有关线性回归函数的更多详细信息,请参阅 SQL 标准文档中关于聚合函数的部分。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_slope.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_slope.md deleted file mode 100644 index 295ef814cab..00000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_slope.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -{ - "title": "REGR_SLOPE", - "language": "zh-CN" -} ---- - -## Description -REGR_SLOPE 用于计算一组数值对的最小二乘拟合线性方程的斜率。 - -## Syntax -``` -REGR_SLOPE(y, x) -``` - -## Parameters -- `y` (数值类型):因变量。 -- `x` (数值类型):自变量。 - -x 和 y 都支持基本数值类型。 - -## Returned values -返回数据类型:FLOAT64 - -函数返回线性回归直线的斜率。 - -如果没有行,或者只有包含空值的行,函数返回 NULL。 - -## Examples -```sql --- 示例 1:基本用法 -SELECT regr_slope(y, x) FROM test; - --- 示例 2:在查询中使用示例数据 -SELECT * FROM test; -+------+------+------+ -| id | x | y | -+------+------+------+ -| 1 | 18 | 13 | -| 3 | 12 | 2 | -| 5 | 10 | 20 | -| 2 | 14 | 27 | -| 4 | 5 | 6 | -+------+------+------+ - -SELECT regr_slope(y, x) FROM test; -+--------------------+ -| regr_slope(y, x) | -+--------------------+ -| 0.6853448275862069 | -+--------------------+ -``` - -## Usage notes -- 此函数会忽略任何包含空值的数值对。 -- 在计算结果会导致除以零的情况下,函数将返回 NULL。 - -## Related functions -REGR_INTERCEPT, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY - -## References -有关线性回归函数的更多详细信息,请参阅 SQL 标准文档中关于聚合函数的部分。 diff --git a/sidebars.json b/sidebars.json index fd432f1dfa8..d3ecccf368a 100644 --- a/sidebars.json +++ b/sidebars.json @@ -1742,7 +1742,7 @@ "sql-manual/sql-functions/aggregate-functions/percentile", "sql-manual/sql-functions/aggregate-functions/percentile-approx", "sql-manual/sql-functions/aggregate-functions/percentile-array", - "sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted", + "sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted", "sql-manual/sql-functions/aggregate-functions/quantile-union", "sql-manual/sql-functions/aggregate-functions/regr-intercept", "sql-manual/sql-functions/aggregate-functions/regr-slope", @@ -1830,7 +1830,7 @@ "sql-manual/sql-functions/aggregate-functions/percentile", "sql-manual/sql-functions/aggregate-functions/percentile-approx", "sql-manual/sql-functions/aggregate-functions/percentile-array", - "sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted", + "sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted", "sql-manual/sql-functions/aggregate-functions/quantile-union", "sql-manual/sql-functions/aggregate-functions/regr-intercept", "sql-manual/sql-functions/aggregate-functions/regr-slope", --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
