This is an automated email from the ASF dual-hosted git repository.
panxiaolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 5a320304483 update doc of percentile series/regr series (#2762)
5a320304483 is described below
commit 5a320304483006974d7c392b809cd5b2d861ee17
Author: Pxl <[email protected]>
AuthorDate: Mon Aug 25 14:21:56 2025 +0800
update doc of percentile series/regr series (#2762)
## Versions
- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
...x_weighted.md => percentile-approx-weighted.md} | 36 +++++++++++--
.../aggregate-functions/percentile-approx.md | 43 ++++++++++++---
.../aggregate-functions/percentile-array.md | 55 ++++++++++++++-----
.../aggregate-functions/percentile.md | 45 +++++++++++++---
.../aggregate-functions/regr-intercept.md | 32 +++++++----
.../aggregate-functions/regr-slope.md | 39 ++++++++++----
.../aggregate-functions/regr_intercept.md | 62 ----------------------
.../aggregate-functions/regr_slope.md | 62 ----------------------
...x_weighted.md => percentile-approx-weighted.md} | 46 ++++++++++++----
.../aggregate-functions/percentile-approx.md | 43 ++++++++++++---
.../aggregate-functions/percentile-array.md | 54 ++++++++++++++-----
.../aggregate-functions/percentile.md | 45 +++++++++++++---
.../aggregate-functions/regr-intercept.md | 30 ++++++++---
.../aggregate-functions/regr-slope.md | 38 +++++++++----
.../aggregate-functions/regr_intercept.md | 62 ----------------------
.../aggregate-functions/regr_slope.md | 62 ----------------------
sidebars.json | 6 +--
17 files changed, 407 insertions(+), 353 deletions(-)
diff --git
a/docs/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md
b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md
similarity index 63%
rename from
docs/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md
rename to
docs/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md
index 00c63988968..843d43e119e 100644
---
a/docs/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md
+++
b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md
@@ -24,14 +24,15 @@ PERCENTILE_APPROX_WEIGHTED(<col>, <weight>, <p> [,
<compression>])
| Parameter | Description |
| -- | -- |
-| `<col>` | The column to calculate the percentile for |
-| `<weight>` | Weight column, must be positive numbers |
+| `<col>` | The column to calculate the percentile for. Supported type: Double
|
+| `<weight>` | Weight column, must be positive numbers. Supported type: Double
|
| `<p>` | Percentile value, range `[0.0, 1.0]`, e.g., `0.99` represents the
`99`th percentile |
-| `<compression>` | Optional parameter, compression ratio, range `[2048,
10000]`. The higher the value, the higher the precision, but the greater the
memory consumption. If not specified or outside the range, use `10000`. |
+| `<compression>` | Optional parameter, supported type: Double. Compression
ratio, range `[2048, 10000]`. The higher the value, the higher the precision,
but the greater the memory consumption. If not specified or outside the range,
`10000` is used. |
## Return Value
Return a `DOUBLE` type value, representing the calculated weighted approximate
percentile.
+If there is no valid data in the group, returns `NULL`.
## Examples
@@ -78,3 +79,32 @@ FROM weighted_scores;
```
+```sql
+select percentile_approx_weighted(if(score>95,score,null), weight, 0.9) from
weighted_scores;
+```
+
+This will only consider non-NULL inputs.
+
+```text
++------------------------------------------------------------------+
+| percentile_approx_weighted(if(score>95,score,null), weight, 0.9) |
++------------------------------------------------------------------+
+| 95.5 |
++------------------------------------------------------------------+
+```
+
+```sql
+select percentile_approx_weighted(score, weight, 0.9, null) from
weighted_scores;
+```
+
+If all input values are NULL, the function returns NULL.
+
+```text
++------------------------------------------------------+
+| percentile_approx_weighted(score, weight, 0.9, null) |
++------------------------------------------------------+
+| NULL |
++------------------------------------------------------+
+```
+
+
diff --git
a/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
index 4d6c906cb4a..737ed1b5e78 100644
--- a/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
+++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
@@ -24,18 +24,19 @@ PERCENTILE_APPROX(<col>, <p> [, <compression>])
| Parameter | Description |
| -- | -- |
-| `<col>` | The column to calculate percentiles for |
-| `<p>` | Percentile value, range `[0.0, 1.0]`, e.g., `0.99` represents the
`99th` percentile |
-| `<compression>` | Optional parameter, compression level, range `[2048,
10000]`, higher values increase precision but consume more memory. If not
specified or out of range, uses `10000` |
+| `<col>` | The column to calculate percentiles for, supported type: Double. |
+| `<p>` | Percentile value, constant, type Double, range `[0.0, 1.0]`, e.g.,
`0.99` means 99th percentile. Must be a constant. |
+| `<compression>` | Optional, compression level, type Double, range `[2048,
10000]`. Higher values increase precision but consume more memory. If not
specified or out of range, defaults to `10000`. |
## Return Value
-Returns a `DOUBLE` value representing the calculated approximate percentile.
+Returns the approximate percentile of the specified column, type Double.
+If there is no valid data in the group, returns NULL.
## Examples
```sql
--- Create sample table
+-- setup
CREATE TABLE response_times (
request_id INT,
response_time DECIMAL(10, 2)
@@ -44,8 +45,6 @@ DISTRIBUTED BY HASH(`request_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- Insert sample data
INSERT INTO response_times VALUES
(1, 10.5),
(2, 15.2),
@@ -57,7 +56,9 @@ INSERT INTO response_times VALUES
(8, 45.9),
(9, 50.4),
(10, 100.6);
+```
+```sql
-- Calculate 99th percentile using different compression levels
SELECT
percentile_approx(response_time, 0.99) as p99_default, -- Default
compression
@@ -74,4 +75,32 @@ FROM response_times;
+-------------------+-------------------+-------------------+
```
+```sql
+SELECT percentile_approx(if(response_time>90,response_time,NULL), 0.5) FROM
response_times;
+```
+
+Only non-NULL data is calculated.
+
+```text
++-----------------------------------------------------------------+
+| percentile_approx(if(response_time>90,response_time,NULL), 0.5) |
++-----------------------------------------------------------------+
+| 100.5999984741211 |
++-----------------------------------------------------------------+
+```
+
+```sql
+SELECT percentile_approx(NULL, 0.99) FROM response_times;
+```
+
+Returns NULL when all input data is NULL.
+
+```text
++-------------------------------+
+| percentile_approx(NULL, 0.99) |
++-------------------------------+
+| NULL |
++-------------------------------+
+```
+
diff --git
a/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md
b/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md
index db333cb783a..16754503c3a 100644
--- a/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md
+++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile-array.md
@@ -25,17 +25,19 @@ PERCENTILE_ARRAY(<col>, <array_p>)
| Parameter | Description |
| -- | -- |
-| `<col>` | The column to calculate the percentile for |
-| `<array_p>` | Percentile array, each element must be in the range `[0.0,
1.0]`, e.g., `[0.5, 0.95, 0.99]` |
+| `<col>` | The column to calculate the exact percentiles for. Supported
types: Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. |
+| `<array_p>` | Percentile array, each element must be a constant of type
Array<Double>, with values in the range `[0.0, 1.0]`, e.g., `[0.5, 0.95,
0.99]`. |
## Return Value
-Return a `DOUBLE` type array, containing the calculated percentile values.
+Returns a DOUBLE type array containing the calculated percentile values.
+If there is no valid data in the group, returns an empty array.
+
## Examples
```sql
--- Create sample table
+-- setup
CREATE TABLE sales_data (
id INT,
amount DECIMAL(10, 2)
@@ -44,8 +46,6 @@ DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- Insert sample data
INSERT INTO sales_data VALUES
(1, 10.5),
(2, 15.2),
@@ -57,16 +57,47 @@ INSERT INTO sales_data VALUES
(8, 45.9),
(9, 50.4),
(10, 100.6);
+```
--- Calculate multiple percentiles
+```sql
SELECT percentile_array(amount, [0.25, 0.5, 0.75, 0.9]) as percentiles
FROM sales_data;
```
+Calculate multiple percentiles.
+
+```text
++-----------------------------------------------------+
+| percentiles |
++-----------------------------------------------------+
+| [21.525000000000002, 33, 44.475, 55.41999999999998] |
++-----------------------------------------------------+
+```
+
+```sql
+SELECT percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) FROM
sales_data;
+```
+
+Only non-NULL data is calculated.
+
+```text
++------------------------------------------------------------+
+| percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) |
++------------------------------------------------------------+
+| [100.6, 100.6] |
++------------------------------------------------------------+
+```
+
+```sql
+SELECT percentile_array(NULL, [0.5, 0.99]) FROM sales_data;
+```
+
+Returns an empty array when all input data is NULL.
+
```text
-+-----------------------------------------+
-| percentiles |
-+-----------------------------------------+
-| [21.25, 32.5, 43.75, 54.99999999999998] |
-+-----------------------------------------+
++-------------------------------------+
+| percentile_array(NULL, [0.5, 0.99]) |
++-------------------------------------+
+| [] |
++-------------------------------------+
```
diff --git a/docs/sql-manual/sql-functions/aggregate-functions/percentile.md
b/docs/sql-manual/sql-functions/aggregate-functions/percentile.md
index 2eae37c9dcb..af13d71287e 100644
--- a/docs/sql-manual/sql-functions/aggregate-functions/percentile.md
+++ b/docs/sql-manual/sql-functions/aggregate-functions/percentile.md
@@ -8,7 +8,6 @@
## Description
Calculates the exact percentile, suitable for small datasets. First sorts the
specified column in descending order, then takes the exact p-th percentile. The
value of `p` is between `0` and `1`. If `p` does not point to an exact
position, it returns the [linear
interpolation](https://en.wikipedia.org/wiki/Linear_interpolation) of the
adjacent values at position `p`. Note that this is not the average of the two
numbers. Special cases:
-- Returns `NULL` when the input column is `NULL`
## Syntax
@@ -20,17 +19,18 @@ PERCENTILE(<col>, <p>)
| Parameter | Description |
| -- | -- |
-| `<col>` | The column to be calculated as the exact percentile, which must be
an integer column. |
-| `<p>` | The exact percentile to be calculated, a constant value, with a
value range of `[0.0, 1.0]`. |
+| `<col>` | The column to calculate the exact percentile for. Supported types:
Double, Float, LargeInt, BigInt, Int, SmallInt, TinyInt. |
+| `<p>` | The exact percentile to be calculated, a constant value. Supported
type: Double. Range: `[0.0, 1.0]`. The second parameter must be a constant. |
## Return Value
-Return the exact percentile of the specified column, with a return type of
`DOUBLE`.
+Return the exact percentile of the specified column, with a return type of
Double.
+If there is no valid data in the group, returns NULL.
## Examples
```sql
--- Create sample table
+-- Setup
CREATE TABLE sales_data
(
product_id INT,
@@ -40,8 +40,6 @@ DISTRIBUTED BY HASH(`product_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- Insert sample data
INSERT INTO sales_data VALUES
(1, 10.00),
(1, 15.00),
@@ -53,8 +51,9 @@ INSERT INTO sales_data VALUES
(1, 45.00),
(1, 50.00),
(1, 100.00);
+```
--- Calculate different percentiles of sales prices
+```sql
SELECT
percentile(sale_price, 0.5) as median_price, -- Median
percentile(sale_price, 0.75) as p75_price, -- 75th percentile
@@ -64,6 +63,8 @@ SELECT
FROM sales_data;
```
+Calculates sale prices at different percentiles.
+
```text
+--------------+-----------+-------------------+-------------------+----------+
| median_price | p75_price | p90_price | p95_price | p99_null |
@@ -71,3 +72,31 @@ FROM sales_data;
| 32.5 | 43.75 | 54.99999999999998 | 77.49999999999994 | NULL |
+--------------+-----------+-------------------+-------------------+----------+
```
+
+```sql
+select percentile(if(sale_price>90,sale_price,NULL), 0.5) from sales_data;
+```
+
+Only non-NULL input values are considered in the calculation.
+
+```text
++----------------------------------------------------+
+| percentile(if(sale_price>90,sale_price,NULL), 0.5) |
++----------------------------------------------------+
+| 100 |
++----------------------------------------------------+
+```
+
+```sql
+select percentile(sale_price, NULL) from sales_data;
+```
+
+If all input values are NULL, returns NULL.
+
+```text
++------------------------------+
+| percentile(sale_price, NULL) |
++------------------------------+
+| NULL |
++------------------------------+
+```
diff --git
a/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
b/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
index 94fb3f93a41..601b7fe5a8e 100644
--- a/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
+++ b/docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
@@ -23,12 +23,12 @@ REGR_INTERCEPT(<y>, <x>)
| Parameter | Description |
| -- | -- |
-| `<y>` | The dependent variable. This must be an expression that can be
evaluated to a numeric type. |
-| `<x>` | The independent variable. This must be an expression that can be
evaluated to a numeric type. |
+| `<y>` | The dependent variable. Supported type: Double. |
+| `<x>` | The independent variable. Supported type: Double. |
## Return Value
-Return a `DOUBLE` value, representing the intercept of the univariate linear
regression line for non-null pairs in a group.
+Returns a Double value representing the intercept of the univariate linear
regression line for non-null pairs in a group. If there are no rows, or only
rows that contain nulls, the function returns NULL.
## Examples
@@ -57,9 +57,23 @@ SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept;
```
```text
-+-------------------------+
-| regr_intercept(y, x) |
-+-------------------------+
-| 5.512931034482759 |
-+-------------------------+
-```
\ No newline at end of file
++----------------------+
+| REGR_INTERCEPT(y, x) |
++----------------------+
+| 5.512931034482759 |
++----------------------+
+```
+
+```sql
+SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept where x>100;
+```
+
+When there are no rows in the group, the function returns `NULL`.
+
+```text
++----------------------+
+| REGR_INTERCEPT(y, x) |
++----------------------+
+| NULL |
++----------------------+
+```
diff --git a/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md
b/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md
index 0b8a78cc6b7..80ceb0a61c7 100644
--- a/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md
+++ b/docs/sql-manual/sql-functions/aggregate-functions/regr-slope.md
@@ -20,17 +20,18 @@ REGR_SLOPE(<y>, <x>)
| Parameter | Description |
| -- | -- |
-| `<y>` | The dependent variable. This must be an expression that can be
evaluated to a numeric type. |
-| `<x>` | The independent variable. This must be an expression that can be
evaluated to a numeric type. |
+| `<y>` | The dependent variable. Supported type: Double. |
+| `<x>` | The independent variable. Supported type: Double. |
## Return Value
-Returns a `DOUBLE` value representing the slope of the linear regression line.
+Returns a Double value representing the slope of the linear regression line.
+If there are no rows in the group, or all rows contain NULLs for the
expressions, the function returns `NULL`.
## Examples
```sql
--- Create example table
+-- setup
CREATE TABLE test_regr_slope (
`id` int,
`x` int,
@@ -48,15 +49,33 @@ INSERT INTO test_regr_slope VALUES
(3, 12, 2),
(4, 5, 6),
(5, 10, 20);
+```
--- Calculate the linear regression slope of x and y
+```sql
SELECT REGR_SLOPE(y, x) FROM test_regr_slope;
```
+Calculate the linear regression slope of x and y.
+
+
+```text
++--------------------+
+| REGR_SLOPE(y, x) |
++--------------------+
+| 0.6853448275862069 |
++--------------------+
+```
+
+```sql
+SELECT REGR_SLOPE(y, x) FROM test_regr_slope where x>100;
+```
+
+When there are no rows in the group, the function returns `NULL`.
+
```text
-+----------------------+
-| regr_slope(y, x) |
-+----------------------+
-| 0.6853448275862069 |
-+----------------------+
++------------------+
+| REGR_SLOPE(y, x) |
++------------------+
+| NULL |
++------------------+
```
diff --git
a/docs/sql-manual/sql-functions/aggregate-functions/regr_intercept.md
b/docs/sql-manual/sql-functions/aggregate-functions/regr_intercept.md
deleted file mode 100644
index e1e9d69bbd3..00000000000
--- a/docs/sql-manual/sql-functions/aggregate-functions/regr_intercept.md
+++ /dev/null
@@ -1,62 +0,0 @@
----
-{
- "title": "REGR_INTERCEPT",
- "language": "en"
-}
----
-
-## Description
-REGR_INTERCEPT is used to calculate the y-intercept of the least squares-fit
linear equation for a set of number pairs.
-
-## Syntax
-```
-REGR_INTERCEPT(y, x)
-```
-
-## Parameters
-- `y` (Numeric): The dependent variable.
-- `x` (Numeric): The independent variable.
-
-Both `x` and `y` support basic numeric types.
-
-## Returned values
-Returned data type: FLOAT64
-
-The function returns the y-intercept of the linear regression line.
-
-If there are no rows, or only rows with null values, the function returns NULL.
-
-## Examples
-```sql
--- Example 1: Basic Usage
-SELECT regr_intercept(y, x) FROM test;
-
--- Example 2: Usage in a query with sample data
-SELECT * FROM test;
-+------+------+------+
-| id | x | y |
-+------+------+------+
-| 1 | 18 | 13 |
-| 3 | 12 | 2 |
-| 5 | 10 | 20 |
-| 2 | 14 | 27 |
-| 4 | 5 | 6 |
-+------+------+------+
-
-SELECT regr_intercept(y, x) FROM test;
-+----------------------+
-| regr_intercept(y, x) |
-+----------------------+
-| 5.512931034482759 |
-+----------------------+
-```
-
-## Usage notes
-- This function ignores any pair where either value is null.
-- In cases where the calculation would result in a division by zero, the
function will return NULL.
-
-## Related functions
-REGR_SLOPE, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY
-
-## References
-For more details about linear regression functions, please refer to the SQL
standard documentation on aggregate functions.
diff --git a/docs/sql-manual/sql-functions/aggregate-functions/regr_slope.md
b/docs/sql-manual/sql-functions/aggregate-functions/regr_slope.md
deleted file mode 100644
index 85a090be310..00000000000
--- a/docs/sql-manual/sql-functions/aggregate-functions/regr_slope.md
+++ /dev/null
@@ -1,62 +0,0 @@
----
-{
- "title": "REGR_SLOPE",
- "language": "en"
-}
----
-
-## Description
-REGR_SLOPE is used to calculate the slope of the least squares-fit linear
equation for a set of number pairs.
-
-## Syntax
-```
-REGR_SLOPE(y, x)
-```
-
-## Parameters
-- `y` (Numeric): The dependent variable.
-- `x` (Numeric): The independent variable.
-
-Both `x` and `y` support basic numeric types.
-
-## Returned values
-Returned data type: FLOAT64
-
-The function returns the slope of the linear regression line.
-
-If there are no rows, or only rows with null values, the function returns NULL.
-
-## Examples
-```sql
--- Example 1: Basic Usage
-SELECT regr_slope(y, x) FROM test;
-
--- Example 2: Usage in a query with sample data
-SELECT * FROM test;
-+------+------+------+
-| id | x | y |
-+------+------+------+
-| 1 | 18 | 13 |
-| 3 | 12 | 2 |
-| 5 | 10 | 20 |
-| 2 | 14 | 27 |
-| 4 | 5 | 6 |
-+------+------+------+
-
-SELECT regr_slope(y, x) FROM test;
-+--------------------+
-| regr_slope(y, x) |
-+--------------------+
-| 0.6853448275862069 |
-+--------------------+
-```
-
-## Usage notes
-- This function ignores any pair where either value is null.
-- In cases where the calculation would result in a division by zero, the
function will return NULL.
-
-## Related functions
-REGR_INTERCEPT, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY
-
-## References
-For more details about linear regression functions, please refer to the SQL
standard documentation on aggregate functions.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md
similarity index 54%
rename from
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md
rename to
i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md
index 6645967819b..c6fcdb2730a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted.md
@@ -24,19 +24,20 @@ PERCENTILE_APPROX_WEIGHTED(<col>, <weight>, <p> [,
<compression>])
| 参数 | 说明 |
| -- | -- |
-| `<col>` | 需要计算百分位数的列 |
-| `<weight>` | 权重列,必须是正数 |
-| `<p>` | 百分位数值,取值范围 `[0.0, 1.0]`,例如 `0.99` 表示 `99` 分位数 |
-| `<compression>` | 可选参数,压缩度,取值范围 `[2048,
10000]`。值越大,精度越高,但内存消耗也越大。如果不指定或超出范围,则使用 `10000`。 |
+| `<col>` | 需要计算百分位数的列,支持类型为 Double 。 |
+| `<weight>` | 权重列,必须是正数,支持类型为 Double 。 |
+| `<p>` | 百分位数值,支持类型为 Double 。取值范围 `[0.0, 1.0]`,例如 `0.99` 表示 `99` 分位数 |
+| `<compression>` | 可选参数,支持类型为 Double 。表示压缩度,取值范围 `[2048,
10000]`。值越大,精度越高,但内存消耗也越大。如果不指定或超出范围,则使用 `10000`。 |
## 返回值
-返回一个 `DOUBLE` 类型的值,表示计算得到的加权近似百分位数。
+返回一个 Double 类型的值,表示计算得到的加权近似百分位数。
+如果组内没有合法数据,则返回 NULL 。
## 举例
```sql
--- 创建示例表
+-- setup
CREATE TABLE weighted_scores (
student_id INT,
score DECIMAL(10, 2),
@@ -46,8 +47,6 @@ DISTRIBUTED BY HASH(student_id) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- 插入示例数据
INSERT INTO weighted_scores VALUES
(1, 85.5, 1), -- 普通作业分数,权重 1
(2, 90.0, 2), -- 重要作业分数,权重 2
@@ -59,8 +58,9 @@ INSERT INTO weighted_scores VALUES
(8, 89.5, 2),
(9, 94.0, 3),
(10, 83.5, 1);
+```
--- 计算带权重的分数分布
+```sql
SELECT
-- 计算不同压缩度下的 90 分位数
percentile_approx_weighted(score, weight, 0.9) as p90_default, --
默认压缩度
@@ -69,6 +69,8 @@ SELECT
FROM weighted_scores;
```
+计算带权重的分数分布。
+
```text
+------------------+------------------+------------------+
| p90_default | p90_fast | p90_accurate |
@@ -77,4 +79,30 @@ FROM weighted_scores;
+------------------+------------------+------------------+
```
+```sql
+select percentile_approx_weighted(if(score>95,score,null), weight, 0.9) from
weighted_scores;
+```
+只会计算输入的非 NULL 的数据。
+
+```text
++------------------------------------------------------------------+
+| percentile_approx_weighted(if(score>95,score,null), weight, 0.9) |
++------------------------------------------------------------------+
+| 95.5 |
++------------------------------------------------------------------+
+```
+
+```sql
+select percentile_approx_weighted(score, weight, 0.9, null) from
weighted_scores;
+```
+
+如果输入数据均为 NULL,则返回NULL。
+
+```text
++------------------------------------------------------+
+| percentile_approx_weighted(score, weight, 0.9, null) |
++------------------------------------------------------+
+| NULL |
++------------------------------------------------------+
+```
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
index a293e43e4a7..5bcf24501f1 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-approx.md
@@ -23,18 +23,19 @@ PERCENTILE_APPROX(<col>, <p> [, <compression>])
| 参数 | 说明 |
| -- | -- |
-| `<col>` | 需要计算百分位数的列 |
-| `<p>` | 百分位数值,取值范围 `[0.0, 1.0]`,例如 `0.99` 表示 `99` 分位数 |
-| `<compression>` | 可选参数,压缩度,取值范围 `[2048,
10000]`。值越大,精度越高,但内存消耗也越大。如果不指定或超出范围,则使用 `10000`。 |
+| `<col>` | 需要计算百分位数的列,支持类型:Double。 |
+| `<p>` | 百分位数,常量,支持类型为 Double,取值范围 `[0.0, 1.0]`,如 `0.99` 表示 99 分位。必须为常量。 |
+| `<compression>` | 可选,压缩度,支持类型为 Double,取值范围 `[2048,
10000]`。值越大精度越高但内存消耗也越大。未指定或超出范围时默认 `10000`。 |
## 返回值
-返回一个 `DOUBLE` 类型的值,表示计算得到的近似百分位数。
+返回指定列的近似百分位数,类型为 Double。
+如果组内没有合法数据,则返回 NULL。
## 举例
```sql
--- 创建示例表
+-- setup
CREATE TABLE response_times (
request_id INT,
response_time DECIMAL(10, 2)
@@ -43,8 +44,6 @@ DISTRIBUTED BY HASH(`request_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- 插入示例数据
INSERT INTO response_times VALUES
(1, 10.5),
(2, 15.2),
@@ -56,7 +55,9 @@ INSERT INTO response_times VALUES
(8, 45.9),
(9, 50.4),
(10, 100.6);
+```
+```sql
-- 使用不同压缩度计算 99 分位数
SELECT
percentile_approx(response_time, 0.99) as p99_default, -- 默认压缩度
@@ -73,4 +74,32 @@ FROM response_times;
+-------------------+-------------------+-------------------+
```
+```sql
+SELECT percentile_approx(if(response_time>90,response_time,NULL), 0.5) FROM
response_times;
+```
+
+只计算非 NULL 数据。
+
+```text
++-----------------------------------------------------------------+
+| percentile_approx(if(response_time>90,response_time,NULL), 0.5) |
++-----------------------------------------------------------------+
+| 100.5999984741211 |
++-----------------------------------------------------------------+
+```
+
+```sql
+SELECT percentile_approx(NULL, 0.99) FROM response_times;
+```
+
+输入数据均为 NULL 时返回 NULL。
+
+```text
++-------------------------------+
+| percentile_approx(NULL, 0.99) |
++-------------------------------+
+| NULL |
++-------------------------------+
+```
+
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md
index 992b5fad561..1f806f23c91 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile-array.md
@@ -24,17 +24,18 @@ PERCENTILE_ARRAY(<col>, <array_p>)
| 参数 | 说明 |
| -- | -- |
-| `<col>` | 需要计算百分位数的列 |
-| `<array_p>` | 百分位数数组,数组中的每个元素必须在 `[0.0, 1.0]` 范围内,例如 `[0.5, 0.95, 0.99]` |
+| `<col>` |
需要被计算精确百分位数的列,支持类型:Double、Float、LargeInt、BigInt、Int、SmallInt、TinyInt。 |
+| `<array_p>` | 百分位数数组,数组中的每个元素必须为常量,类型为 Array<Double>,取值范围为 `[0.0, 1.0]`,如
`[0.5, 0.95, 0.99]`。 |
## 返回值
-返回一个 `DOUBLE` 类型的数组,包含了对应于输入百分位数数组的计算结果。
+返回一个 DOUBLE 类型的数组,包含了对应于输入百分位数数组的计算结果。
+如果组内没有合法数据,则返回空数组。
## 举例
```sql
--- 创建示例表
+-- setup
CREATE TABLE sales_data (
id INT,
amount DECIMAL(10, 2)
@@ -43,8 +44,6 @@ DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- 插入示例数据
INSERT INTO sales_data VALUES
(1, 10.5),
(2, 15.2),
@@ -56,16 +55,47 @@ INSERT INTO sales_data VALUES
(8, 45.9),
(9, 50.4),
(10, 100.6);
+```
--- 计算多个百分位数
+```sql
SELECT percentile_array(amount, [0.25, 0.5, 0.75, 0.9]) as percentiles
FROM sales_data;
```
+计算多个百分位数。
+
+```text
++-----------------------------------------------------+
+| percentiles |
++-----------------------------------------------------+
+| [21.525000000000002, 33, 44.475, 55.41999999999998] |
++-----------------------------------------------------+
+```
+
+```sql
+SELECT percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) FROM
sales_data;
+```
+
+只计算非 NULL 数据。
+
+```text
++------------------------------------------------------------+
+| percentile_array(if(amount>90, amount, NULL), [0.5, 0.99]) |
++------------------------------------------------------------+
+| [100.6, 100.6] |
++------------------------------------------------------------+
+```
+
+```sql
+SELECT percentile_array(NULL, [0.5, 0.99]) FROM sales_data;
+```
+
+输入数据均为 NULL 时返回空数组。
+
```text
-+-----------------------------------------+
-| percentiles |
-+-----------------------------------------+
-| [21.25, 32.5, 43.75, 54.99999999999998] |
-+-----------------------------------------+
++-------------------------------------+
+| percentile_array(NULL, [0.5, 0.99]) |
++-------------------------------------+
+| [] |
++-------------------------------------+
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md
index 69510333e43..a6d9c33cd02 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/percentile.md
@@ -8,7 +8,6 @@
## 描述
计算精确的百分位数,适用于小数据量。先对指定列降序排列,然后取精确的第 `p` 位百分数。`p` 的值介于 `0` 到 `1` 之间,如果 `p`
不指向精确的位置,则返回所指位置两侧相邻数值在 `p`
处的[线性插值](https://zh.wikipedia.org/wiki/%E7%BA%BF%E6%80%A7%E6%8F%92%E5%80%BC),注意这不是两数字的平均数。特殊情况:
-- 当输入的列为 `NULL` 时,返回 `NULL`
## 语法
@@ -20,17 +19,18 @@ PERCENTILE(<col>, <p>)
| 参数 | 说明 |
| -- | -- |
-| `<col>` | 需要被计算精确的百分位数的列,必须是整数类型的列。 |
-| `<p>` | 需要精确的百分位数,常量,取值为 `[0.0, 1.0]`。 |
+| `<col>` |
需要被计算精确的百分位数的列,支持类型:Double、Float、LargeInt、BigInt、Int、SmallInt、TinyInt。 |
+| `<p>` | 需要精确的百分位数,常量,支持类型:Double,取值范围为 `[0.0, 1.0]`。并且要求为常量(非运行时列)。 |
## 返回值
-返回指定列的精确的百分位数,类型为 `DOUBLE`。
+返回指定列的精确的百分位数,类型为 Double。
+如果组内没有合法数据,则返回 NULL 。
## 举例
```sql
--- 创建示例表
+-- setup
CREATE TABLE sales_data
(
product_id INT,
@@ -40,8 +40,6 @@ DISTRIBUTED BY HASH(`product_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-
--- 插入示例数据
INSERT INTO sales_data VALUES
(1, 10.00),
(1, 15.00),
@@ -53,8 +51,9 @@ INSERT INTO sales_data VALUES
(1, 45.00),
(1, 50.00),
(1, 100.00);
+```
--- 计算不同百分位的销售价格
+```sql
SELECT
percentile(sale_price, 0.5) as median_price, -- 中位数
percentile(sale_price, 0.75) as p75_price, -- 75 分位数
@@ -64,6 +63,8 @@ SELECT
FROM sales_data;
```
+计算不同百分位的销售价格。
+
```text
+--------------+-----------+-------------------+-------------------+----------+
| median_price | p75_price | p90_price | p95_price | p99_null |
@@ -71,3 +72,31 @@ FROM sales_data;
| 32.5 | 43.75 | 54.99999999999998 | 77.49999999999994 | NULL |
+--------------+-----------+-------------------+-------------------+----------+
```
+
+```sql
+select percentile(if(sale_price>90,sale_price,NULL), 0.5) from sales_data;
+```
+
+只会计算输入的非 NULL 的数据。
+
+```text
++----------------------------------------------------+
+| percentile(if(sale_price>90,sale_price,NULL), 0.5) |
++----------------------------------------------------+
+| 100 |
++----------------------------------------------------+
+```
+
+```sql
+select percentile(sale_price, NULL) from sales_data;
+```
+
+如果输入数据均为 NULL,则返回NULL。
+
+```text
++------------------------------+
+| percentile(sale_price, NULL) |
++------------------------------+
+| NULL |
++------------------------------+
+```
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
index 44d030a2a66..00e5d773cfb 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-intercept.md
@@ -23,17 +23,18 @@ REGR_INTERCEPT(<y>, <x>)
| 参数 | 说明 |
| -- | -- |
-| `<y>` | 因变量,必须是可以计算为数值类型的表达式。 |
-| `<x>` | 自变量,必须是可以计算为数值类型的表达式。 |
+| `<y>` | 因变量,支持类型为 Double。 |
+| `<x>` | 自变量,支持类型为 Double。 |
## 返回值
-返回 `DOUBLE` 类型的值,表示线性回归线与 `y` 轴的交点。
+返回 Double 类型的值,表示线性回归线与 `y` 轴的交点。
+如果没有行,或者只有包含空值的行,函数返回 NULL。
## 举例
```sql
--- 创建示例表
+-- setup
CREATE TABLE test_regr_intercept (
`id` int,
`x` int,
@@ -51,15 +52,32 @@ INSERT INTO test_regr_intercept VALUES
(3, 12, 2),
(4, 5, 6),
(5, 10, 20);
+```
--- 计算 x 和 y 的线性回归截距
+```sql
SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept;
```
+计算 x 和 y 的线性回归截距。
+
```text
+-------------------------+
| regr_intercept(y, x) |
+-------------------------+
| 5.512931034482759 |
+-------------------------+
-```
\ No newline at end of file
+```
+
+```sql
+SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept where x>100;
+```
+
+组内没有数据时,返回 NULL 。
+
+```text
++----------------------+
+| REGR_INTERCEPT(y, x) |
++----------------------+
+| NULL |
++----------------------+
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md
index c2fc4c89830..8af7b025cb4 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr-slope.md
@@ -20,17 +20,18 @@ REGR_SLOPE(<y>, <x>)
| 参数 | 说明 |
| -- | -- |
-| `<y>` | 因变量,必须是可以计算为数值类型的表达式。 |
-| `<x>` | 自变量,必须是可以计算为数值类型的表达式。 |
+| `<y>` | 因变量,支持类型为 Double。 |
+| `<x>` | 自变量,支持类型为 Double。 |
## 返回值
-返回 DOUBLE 类型的值,表示线性回归线的斜率。
+返回 Double 类型的值,表示线性回归线的斜率。
+如果没有行,或者只有包含空值的行,函数返回 NULL。
## 举例
```sql
--- 创建示例表
+-- setup
CREATE TABLE test_regr_slope (
`id` int,
`x` int,
@@ -48,15 +49,32 @@ INSERT INTO test_regr_slope VALUES
(3, 12, 2),
(4, 5, 6),
(5, 10, 20);
+```
--- 计算 x 和 y 的线性回归斜率
+```sql
SELECT REGR_SLOPE(y, x) FROM test_regr_slope;
```
+计算 x 和 y 的线性回归截距。
+
+```text
++--------------------+
+| REGR_SLOPE(y, x) |
++--------------------+
+| 0.6853448275862069 |
++--------------------+
+```
+
+```sql
+SELECT REGR_SLOPE(y, x) FROM test_regr_slope where x>100;
+```
+
+组内没有数据时,返回 NULL 。
+
```text
-+----------------------+
-| regr_slope(y, x) |
-+----------------------+
-| 0.6853448275862069 |
-+----------------------+
++------------------+
+| REGR_SLOPE(y, x) |
++------------------+
+| NULL |
++------------------+
```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_intercept.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_intercept.md
deleted file mode 100644
index a619daac246..00000000000
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_intercept.md
+++ /dev/null
@@ -1,62 +0,0 @@
----
-{
- "title": "REGR_INTERCEPT",
- "language": "zh-CN"
-}
----
-
-## Description
-REGR_INTERCEPT 用于计算一组数值对的最小二乘拟合线性方程的截距。
-
-## Syntax
-```
-REGR_INTERCEPT(y, x)
-```
-
-## Parameters
-- `y` (数值类型):因变量。
-- `x` (数值类型):自变量。
-
-x 和 y 都支持基本数值类型。
-
-## Returned values
-返回数据类型:FLOAT64
-
-函数返回线性回归直线的截距。
-
-如果没有行,或者只有包含空值的行,函数返回 NULL。
-
-## Examples
-```sql
--- 示例 1:基本用法
-SELECT regr_intercept(y, x) FROM test;
-
--- 示例 2:在查询中使用示例数据
-SELECT * FROM test;
-+------+------+------+
-| id | x | y |
-+------+------+------+
-| 1 | 18 | 13 |
-| 3 | 12 | 2 |
-| 5 | 10 | 20 |
-| 2 | 14 | 27 |
-| 4 | 5 | 6 |
-+------+------+------+
-
-SELECT regr_intercept(y, x) FROM test;
-+----------------------+
-| regr_intercept(y, x) |
-+----------------------+
-| 5.512931034482759 |
-+----------------------+
-```
-
-## Usage notes
-- 此函数会忽略任何包含空值的数值对。
-- 在计算结果会导致除以零的情况下,函数将返回 NULL。
-
-## Related functions
-REGR_SLOPE, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY
-
-## References
-有关线性回归函数的更多详细信息,请参阅 SQL 标准文档中关于聚合函数的部分。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_slope.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_slope.md
deleted file mode 100644
index 295ef814cab..00000000000
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/regr_slope.md
+++ /dev/null
@@ -1,62 +0,0 @@
----
-{
- "title": "REGR_SLOPE",
- "language": "zh-CN"
-}
----
-
-## Description
-REGR_SLOPE 用于计算一组数值对的最小二乘拟合线性方程的斜率。
-
-## Syntax
-```
-REGR_SLOPE(y, x)
-```
-
-## Parameters
-- `y` (数值类型):因变量。
-- `x` (数值类型):自变量。
-
-x 和 y 都支持基本数值类型。
-
-## Returned values
-返回数据类型:FLOAT64
-
-函数返回线性回归直线的斜率。
-
-如果没有行,或者只有包含空值的行,函数返回 NULL。
-
-## Examples
-```sql
--- 示例 1:基本用法
-SELECT regr_slope(y, x) FROM test;
-
--- 示例 2:在查询中使用示例数据
-SELECT * FROM test;
-+------+------+------+
-| id | x | y |
-+------+------+------+
-| 1 | 18 | 13 |
-| 3 | 12 | 2 |
-| 5 | 10 | 20 |
-| 2 | 14 | 27 |
-| 4 | 5 | 6 |
-+------+------+------+
-
-SELECT regr_slope(y, x) FROM test;
-+--------------------+
-| regr_slope(y, x) |
-+--------------------+
-| 0.6853448275862069 |
-+--------------------+
-```
-
-## Usage notes
-- 此函数会忽略任何包含空值的数值对。
-- 在计算结果会导致除以零的情况下,函数将返回 NULL。
-
-## Related functions
-REGR_INTERCEPT, REGR_R2, REGR_COUNT, REGR_AVGX, REGR_AVGY
-
-## References
-有关线性回归函数的更多详细信息,请参阅 SQL 标准文档中关于聚合函数的部分。
diff --git a/sidebars.json b/sidebars.json
index 131b13642ef..cc1b03c797a 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -1743,12 +1743,10 @@
"sql-manual/sql-functions/aggregate-functions/percentile",
"sql-manual/sql-functions/aggregate-functions/percentile-approx",
"sql-manual/sql-functions/aggregate-functions/percentile-array",
-
"sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted",
+
"sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted",
"sql-manual/sql-functions/aggregate-functions/quantile-union",
"sql-manual/sql-functions/aggregate-functions/regr-intercept",
"sql-manual/sql-functions/aggregate-functions/regr-slope",
-
"sql-manual/sql-functions/aggregate-functions/regr_intercept",
-
"sql-manual/sql-functions/aggregate-functions/regr_slope",
"sql-manual/sql-functions/aggregate-functions/retention",
"sql-manual/sql-functions/aggregate-functions/sequence-count",
"sql-manual/sql-functions/aggregate-functions/sequence-match",
@@ -1831,7 +1829,7 @@
"sql-manual/sql-functions/aggregate-functions/percentile",
"sql-manual/sql-functions/aggregate-functions/percentile-approx",
"sql-manual/sql-functions/aggregate-functions/percentile-array",
-
"sql-manual/sql-functions/aggregate-functions/percentile_approx_weighted",
+
"sql-manual/sql-functions/aggregate-functions/percentile-approx-weighted",
"sql-manual/sql-functions/aggregate-functions/quantile-union",
"sql-manual/sql-functions/aggregate-functions/regr-intercept",
"sql-manual/sql-functions/aggregate-functions/regr-slope",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]