This is an automated email from the ASF dual-hosted git repository. critas pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/iotdb-docs.git
The following commit(s) were added to refs/heads/main by this push:
new e429339f add approx_count_distinct aggregate function in table model
(#754)
e429339f is described below
commit e429339f8474598bc49f9da9cf8f84489cd529c8
Author: leto-b <[email protected]>
AuthorDate: Wed Jun 4 09:41:32 2025 +0800
add approx_count_distinct aggregate function in table model (#754)
---
.../Master/Table/SQL-Manual/Basis-Function.md | 78 +++++++++++++--------
.../latest-Table/SQL-Manual/Basis-Function.md | 78 +++++++++++++--------
.../Master/Table/SQL-Manual/Basis-Function.md | 80 ++++++++++++++--------
.../latest-Table/SQL-Manual/Basis-Function.md | 80 ++++++++++++++--------
4 files changed, 202 insertions(+), 114 deletions(-)
diff --git a/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
b/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
index 2a5dfcc2..44a0b67b 100644
--- a/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
+++ b/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
@@ -156,28 +156,29 @@ SELECT LEAST(temperature,humidity) FROM table2;
### 2.2 Supported Aggregate Functions
-| Function Name | Description
| Allowed Input Types
[...]
-|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------
[...]
-| COUNT | Counts the number of data points.
| All types
[...]
-| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a
specified boolean expression.
| `exp` must be a boolean
expression,(e.g. `c [...]
-| SUM | Calculates the sum.
| INT32 INT64 FLOAT DOUBLE
[...]
-| AVG | Calculates the average.
| INT32 INT64 FLOAT DOUBLE
[...]
-| MAX | Finds the maximum value.
| All types
[...]
-| MIN | Finds the minimum value.
| All types
[...]
-| FIRST | Finds the value with the smallest timestamp that is not
NULL.
| All types
[...]
-| LAST | Finds the value with the largest timestamp that is not NULL.
| All types
[...]
-| STDDEV | Alias for STDDEV_SAMP, calculates the sample standard
deviation.
| INT32 INT64 FLOAT
DOUBLE [...]
-| STDDEV_POP | Calculates the population standard deviation.
| INT32 INT64 FLOAT DOUBLE
[...]
-| STDDEV_SAMP | Calculates the sample standard deviation.
| INT32 INT64 FLOAT DOUBLE
[...]
-| VARIANCE | Alias for VAR_SAMP, calculates the sample variance.
| INT32 INT64 FLOAT DOUBLE
[...]
-| VAR_POP | Calculates the population variance.
| INT32 INT64 FLOAT DOUBLE
[...]
-| VAR_SAMP | Calculates the sample variance.
| INT32 INT64 FLOAT DOUBLE
[...]
-| EXTREME | Finds the value with the largest absolute value. If the
largest absolute values of positive and negative values are equal, returns the
positive value.
| INT32 INT64 FLOAT
DOUBLE [...]
-| MODE | Finds the mode. Note: 1. There is a risk of memory exception
when the number of distinct values in the input sequence is too large; 2. If
all elements have the same frequency, i.e., there is no mode, a random element
is returned; 3. If there are multiple modes, a random mode is returned; 4. NULL
values are also counted in frequency, so even if not all values in the input
sequence are NULL, the final result may still be NULL. | All types
[...]
-| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to the
maximum y in the binary input x and y. MAX_BY(time, x) returns the timestamp
when x is at its maximum.
| x and y can be of
any type [...]
-| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to the
minimum y in the binary input x and y. MIN_BY(time, x) returns the timestamp
when x is at its minimum.
| x and y can be of
any type [...]
-| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row when y
is the first non-null value.
| x and y can be of any type
[...]
-| LAST_BY | LAST_BY(x, y) finds the value of x in the same row when y is
the last non-null value.
| x and y can be of any type
[...]
+| Function Name | Description
[...]
+|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| COUNT | Counts the number of data points.
[...]
+| COUNT_IF | COUNT_IF(exp) counts the number of rows that
satisfy a specified boolean expression.
[...]
+| APPROX_COUNT_DISTINCT | The APPROX_COUNT_DISTINCT(x[, maxStandardError])
function provides an approximation of COUNT(DISTINCT x), returning the
estimated number of distinct input values.
[...]
+| SUM | Calculates the sum.
[...]
+| AVG | Calculates the average.
[...]
+| MAX | Finds the maximum value.
[...]
+| MIN | Finds the minimum value.
[...]
+| FIRST | Finds the value with the smallest timestamp that is
not NULL.
[...]
+| LAST | Finds the value with the largest timestamp that is
not NULL.
[...]
+| STDDEV | Alias for STDDEV_SAMP, calculates the sample
standard deviation.
[...]
+| STDDEV_POP | Calculates the population standard deviation.
[...]
+| STDDEV_SAMP | Calculates the sample standard deviation.
[...]
+| VARIANCE | Alias for VAR_SAMP, calculates the sample
variance.
[...]
+| VAR_POP | Calculates the population variance.
[...]
+| VAR_SAMP | Calculates the sample variance.
[...]
+| EXTREME | Finds the value with the largest absolute value. If
the largest absolute values of positive and negative values are equal, returns
the positive value.
[...]
+| MODE | Finds the mode. Note: 1. There is a risk of memory
exception when the number of distinct values in the input sequence is too
large; 2. If all elements have the same frequency, i.e., there is no mode, a
random element is returned; 3. If there are multiple modes, a random mode is
returned; 4. NULL values are also counted in frequency, so even if not all
values in the input sequence are NULL, the final result may still be NULL.
[...]
+| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to
the maximum y in the binary input x and y. MAX_BY(time, x) returns the
timestamp when x is at its maximum.
[...]
+| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to
the minimum y in the binary input x and y. MIN_BY(time, x) returns the
timestamp when x is at its minimum.
[...]
+| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row
when y is the first non-null value.
[...]
+| LAST_BY | LAST_BY(x, y) finds the value of x in the same row
when y is the last non-null value.
[...]
### 2.3 Examples
@@ -229,8 +230,29 @@ Total line number = 1
It costs 0.047s
```
+#### 2.3.4 Approx_count_distinct
-#### 2.3.4 First
+Retrieve the number of distinct values in the `temperature` column from
`table1`.
+
+```sql
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature) as approx FROM table1;
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1;
+```
+
+The execution result is as follows:
+
+```sql
++------+------+
+|origin|approx|
++------+------+
+| 3| 3|
++------+------+
+Total line number = 1
+It costs 0.022s
+```
+
+
+#### 2.3.5 First
Finds the values with the smallest timestamp that are not NULL in the
`temperature` and `humidity` columns.
@@ -250,7 +272,7 @@ Total line number = 1
It costs 0.170s
```
-#### 2.3.5 Last
+#### 2.3.6 Last
Finds the values with the largest timestamp that are not NULL in the
`temperature` and `humidity` columns.
@@ -270,7 +292,7 @@ Total line number = 1
It costs 0.211s
```
-#### 2.3.6 First_by
+#### 2.3.7 First_by
Finds the `time` value of the row with the smallest timestamp that is not NULL
in the `temperature` column, and the `humidity` value of the row with the
smallest timestamp that is not NULL in the `temperature` column.
@@ -290,7 +312,7 @@ Total line number = 1
It costs 0.269s
```
-#### 2.3.7 Last_by
+#### 2.3.8 Last_by
Queries the `time` value of the row with the largest timestamp that is not
NULL in the `temperature` column, and the `humidity` value of the row with the
largest timestamp that is not NULL in the `temperature` column.
@@ -310,7 +332,7 @@ Total line number = 1
It costs 0.070s
```
-#### 2.3.8 Max_by
+#### 2.3.9 Max_by
Queries the `time` value of the row where the `temperature` column is at its
maximum, and the `humidity` value of the row where the `temperature` column is
at its maximum.
@@ -330,7 +352,7 @@ Total line number = 1
It costs 0.172s
```
-#### 2.3.9 Min_by
+#### 2.3.10 Min_by
Queries the `time` value of the row where the `temperature` column is at its
minimum, and the `humidity` value of the row where the `temperature` column is
at its minimum.
diff --git a/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
b/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
index 2a5dfcc2..44a0b67b 100644
--- a/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
+++ b/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
@@ -156,28 +156,29 @@ SELECT LEAST(temperature,humidity) FROM table2;
### 2.2 Supported Aggregate Functions
-| Function Name | Description
| Allowed Input Types
[...]
-|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------
[...]
-| COUNT | Counts the number of data points.
| All types
[...]
-| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a
specified boolean expression.
| `exp` must be a boolean
expression,(e.g. `c [...]
-| SUM | Calculates the sum.
| INT32 INT64 FLOAT DOUBLE
[...]
-| AVG | Calculates the average.
| INT32 INT64 FLOAT DOUBLE
[...]
-| MAX | Finds the maximum value.
| All types
[...]
-| MIN | Finds the minimum value.
| All types
[...]
-| FIRST | Finds the value with the smallest timestamp that is not
NULL.
| All types
[...]
-| LAST | Finds the value with the largest timestamp that is not NULL.
| All types
[...]
-| STDDEV | Alias for STDDEV_SAMP, calculates the sample standard
deviation.
| INT32 INT64 FLOAT
DOUBLE [...]
-| STDDEV_POP | Calculates the population standard deviation.
| INT32 INT64 FLOAT DOUBLE
[...]
-| STDDEV_SAMP | Calculates the sample standard deviation.
| INT32 INT64 FLOAT DOUBLE
[...]
-| VARIANCE | Alias for VAR_SAMP, calculates the sample variance.
| INT32 INT64 FLOAT DOUBLE
[...]
-| VAR_POP | Calculates the population variance.
| INT32 INT64 FLOAT DOUBLE
[...]
-| VAR_SAMP | Calculates the sample variance.
| INT32 INT64 FLOAT DOUBLE
[...]
-| EXTREME | Finds the value with the largest absolute value. If the
largest absolute values of positive and negative values are equal, returns the
positive value.
| INT32 INT64 FLOAT
DOUBLE [...]
-| MODE | Finds the mode. Note: 1. There is a risk of memory exception
when the number of distinct values in the input sequence is too large; 2. If
all elements have the same frequency, i.e., there is no mode, a random element
is returned; 3. If there are multiple modes, a random mode is returned; 4. NULL
values are also counted in frequency, so even if not all values in the input
sequence are NULL, the final result may still be NULL. | All types
[...]
-| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to the
maximum y in the binary input x and y. MAX_BY(time, x) returns the timestamp
when x is at its maximum.
| x and y can be of
any type [...]
-| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to the
minimum y in the binary input x and y. MIN_BY(time, x) returns the timestamp
when x is at its minimum.
| x and y can be of
any type [...]
-| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row when y
is the first non-null value.
| x and y can be of any type
[...]
-| LAST_BY | LAST_BY(x, y) finds the value of x in the same row when y is
the last non-null value.
| x and y can be of any type
[...]
+| Function Name | Description
[...]
+|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| COUNT | Counts the number of data points.
[...]
+| COUNT_IF | COUNT_IF(exp) counts the number of rows that
satisfy a specified boolean expression.
[...]
+| APPROX_COUNT_DISTINCT | The APPROX_COUNT_DISTINCT(x[, maxStandardError])
function provides an approximation of COUNT(DISTINCT x), returning the
estimated number of distinct input values.
[...]
+| SUM | Calculates the sum.
[...]
+| AVG | Calculates the average.
[...]
+| MAX | Finds the maximum value.
[...]
+| MIN | Finds the minimum value.
[...]
+| FIRST | Finds the value with the smallest timestamp that is
not NULL.
[...]
+| LAST | Finds the value with the largest timestamp that is
not NULL.
[...]
+| STDDEV | Alias for STDDEV_SAMP, calculates the sample
standard deviation.
[...]
+| STDDEV_POP | Calculates the population standard deviation.
[...]
+| STDDEV_SAMP | Calculates the sample standard deviation.
[...]
+| VARIANCE | Alias for VAR_SAMP, calculates the sample
variance.
[...]
+| VAR_POP | Calculates the population variance.
[...]
+| VAR_SAMP | Calculates the sample variance.
[...]
+| EXTREME | Finds the value with the largest absolute value. If
the largest absolute values of positive and negative values are equal, returns
the positive value.
[...]
+| MODE | Finds the mode. Note: 1. There is a risk of memory
exception when the number of distinct values in the input sequence is too
large; 2. If all elements have the same frequency, i.e., there is no mode, a
random element is returned; 3. If there are multiple modes, a random mode is
returned; 4. NULL values are also counted in frequency, so even if not all
values in the input sequence are NULL, the final result may still be NULL.
[...]
+| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to
the maximum y in the binary input x and y. MAX_BY(time, x) returns the
timestamp when x is at its maximum.
[...]
+| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to
the minimum y in the binary input x and y. MIN_BY(time, x) returns the
timestamp when x is at its minimum.
[...]
+| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row
when y is the first non-null value.
[...]
+| LAST_BY | LAST_BY(x, y) finds the value of x in the same row
when y is the last non-null value.
[...]
### 2.3 Examples
@@ -229,8 +230,29 @@ Total line number = 1
It costs 0.047s
```
+#### 2.3.4 Approx_count_distinct
-#### 2.3.4 First
+Retrieve the number of distinct values in the `temperature` column from
`table1`.
+
+```sql
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature) as approx FROM table1;
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1;
+```
+
+The execution result is as follows:
+
+```sql
++------+------+
+|origin|approx|
++------+------+
+| 3| 3|
++------+------+
+Total line number = 1
+It costs 0.022s
+```
+
+
+#### 2.3.5 First
Finds the values with the smallest timestamp that are not NULL in the
`temperature` and `humidity` columns.
@@ -250,7 +272,7 @@ Total line number = 1
It costs 0.170s
```
-#### 2.3.5 Last
+#### 2.3.6 Last
Finds the values with the largest timestamp that are not NULL in the
`temperature` and `humidity` columns.
@@ -270,7 +292,7 @@ Total line number = 1
It costs 0.211s
```
-#### 2.3.6 First_by
+#### 2.3.7 First_by
Finds the `time` value of the row with the smallest timestamp that is not NULL
in the `temperature` column, and the `humidity` value of the row with the
smallest timestamp that is not NULL in the `temperature` column.
@@ -290,7 +312,7 @@ Total line number = 1
It costs 0.269s
```
-#### 2.3.7 Last_by
+#### 2.3.8 Last_by
Queries the `time` value of the row with the largest timestamp that is not
NULL in the `temperature` column, and the `humidity` value of the row with the
largest timestamp that is not NULL in the `temperature` column.
@@ -310,7 +332,7 @@ Total line number = 1
It costs 0.070s
```
-#### 2.3.8 Max_by
+#### 2.3.9 Max_by
Queries the `time` value of the row where the `temperature` column is at its
maximum, and the `humidity` value of the row where the `temperature` column is
at its maximum.
@@ -330,7 +352,7 @@ Total line number = 1
It costs 0.172s
```
-#### 2.3.9 Min_by
+#### 2.3.10 Min_by
Queries the `time` value of the row where the `temperature` column is at its
minimum, and the `humidity` value of the row where the `temperature` column is
at its minimum.
diff --git a/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
b/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
index 33ca019b..528d738e 100644
--- a/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
+++ b/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
@@ -155,28 +155,29 @@ SELECT LEAST(temperature,humidity) FROM table2;
### 2.2 支持的聚合函数
-| 函数名 | 功能描述 |
允许的输入类型 | 输出类型 |
-| ----------- | ------------------------------------------------------------
|-----------------------------------------------|------------------|
-| COUNT | 计算数据点数。 | 所有类型
| INT64 |
-| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数 | exp 必须是一个布尔类型的表达式,例如
count_if(temperature>20) | INT64 |
-| SUM | 求和。 |
INT32 INT64 FLOAT DOUBLE | DOUBLE |
-| AVG | 求平均值。 |
INT32 INT64 FLOAT DOUBLE | DOUBLE |
-| MAX | 求最大值。 | 所有类型
| 与输入类型一致 |
-| MIN | 求最小值。 | 所有类型
| 与输入类型一致 |
-| FIRST | 求时间戳最小且不为 NULL 的值。 | 所有类型
| 与输入类型一致 |
-| LAST | 求时间戳最大且不为 NULL 的值。 | 所有类型
| 与输入类型一致 |
-| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| STDDEV_POP | 求总体标准差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| STDDEV_SAMP | 求样本标准差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| VAR_POP | 求总体方差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| VAR_SAMP | 求样本方差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE
| 与输入类型一致 |
-| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险;
2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL
值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型
| 与输入类型一致 |
-| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x
取最大值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致
|
-| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x
取最小值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致
|
-| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y
可以是任意类型 | 与第一个输入 x 的数据类型一致 |
-| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y
可以是任意类型 | 与第一个输入 x 的数据类型一致 |
+| 函数名 | 功能描述
| 允许的输入类型
| 输出类型 |
+|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------------|
+| COUNT | 计算数据点数。
| 所有类型
| INT64 |
+| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数
| exp 必须是一个布尔类型的表达式,例如 count_if(temperature>20)
| INT64 |
+| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供
COUNT(DISTINCT x) 的近似值,返回不同输入值的近似个数。 |
x:待计算列,支持所有类型;<br> maxStandardError:指定该函数应产生的最大标准误差,取值范围[0.0040625,
0.26],未指定值时默认0.023。 | INT64 |
+| SUM | 求和。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| AVG | 求平均值。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| MAX | 求最大值。
| 所有类型
| 与输入类型一致 |
+| MIN | 求最小值。
| 所有类型
| 与输入类型一致 |
+| FIRST | 求时间戳最小且不为 NULL 的值。
| 所有类型
| 与输入类型一致 |
+| LAST | 求时间戳最大且不为 NULL 的值。
| 所有类型
| 与输入类型一致 |
+| STDDEV | STDDEV_SAMP 的别名,求样本标准差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| STDDEV_POP | 求总体标准差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| STDDEV_SAMP | 求样本标准差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| VARIANCE | VAR_SAMP 的别名,求样本方差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| VAR_POP | 求总体方差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| VAR_SAMP | 求样本方差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。
| INT32 INT64 FLOAT DOUBLE
| 与输入类型一致 |
+| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险;
2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL
值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型
| 与输入类型一致 |
+| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x
的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
+| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x
的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
+| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
+| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
### 2.3 示例
@@ -213,7 +214,7 @@ It costs 0.834s
统计 `table2` 中 到达时间 `arrival_time` 不是 `null` 的记录行数。
```sql
-select count_if(arrival_time is not null) from table2;
+IoTDB> select count_if(arrival_time is not null) from table2;
```
执行结果如下:
@@ -228,8 +229,29 @@ Total line number = 1
It costs 0.047s
```
+#### 2.3.4 Approx_count_distinct
-#### 2.3.4 First
+查询 `table1` 中 `temperature` 列不同值的个数。
+
+```sql
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature) as approx FROM table1;
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1;
+```
+
+执行结果如下:
+
+```sql
++------+------+
+|origin|approx|
++------+------+
+| 3| 3|
++------+------+
+Total line number = 1
+It costs 0.022s
+```
+
+
+#### 2.3.5 First
查询`temperature`列、`humidity`列时间戳最小且不为 NULL 的值。
@@ -249,7 +271,7 @@ Total line number = 1
It costs 0.170s
```
-#### 2.3.5 Last
+#### 2.3.6 Last
查询`temperature`列、`humidity`列时间戳最大且不为 NULL 的值。
@@ -269,7 +291,7 @@ Total line number = 1
It costs 0.211s
```
-#### 2.3.6 First_by
+#### 2.3.7 First_by
查询 `temperature` 列中非 NULL 且时间戳最小的行的 `time` 值,以及 `temperature` 列中非 NULL
且时间戳最小的行的 `humidity` 值。
@@ -289,7 +311,7 @@ Total line number = 1
It costs 0.269s
```
-#### 2.3.7 Last_by
+#### 2.3.8 Last_by
查询`temperature` 列中非 NULL 且时间戳最大的行的 `time` 值,以及 `temperature` 列中非 NULL
且时间戳最大的行的 `humidity` 值。
@@ -309,7 +331,7 @@ Total line number = 1
It costs 0.070s
```
-#### 2.3.8 Max_by
+#### 2.3.9 Max_by
查询`temperature` 列中最大值所在行的 `time` 值,以及`temperature` 列中最大值所在行的 `humidity` 值。
@@ -329,7 +351,7 @@ Total line number = 1
It costs 0.172s
```
-#### 2.3.9 Min_by
+#### 2.3.10 Min_by
查询`temperature` 列中最小值所在行的 `time` 值,以及`temperature` 列中最小值所在行的 `humidity` 值。
diff --git a/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
b/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
index 33ca019b..528d738e 100644
--- a/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
+++ b/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
@@ -155,28 +155,29 @@ SELECT LEAST(temperature,humidity) FROM table2;
### 2.2 支持的聚合函数
-| 函数名 | 功能描述 |
允许的输入类型 | 输出类型 |
-| ----------- | ------------------------------------------------------------
|-----------------------------------------------|------------------|
-| COUNT | 计算数据点数。 | 所有类型
| INT64 |
-| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数 | exp 必须是一个布尔类型的表达式,例如
count_if(temperature>20) | INT64 |
-| SUM | 求和。 |
INT32 INT64 FLOAT DOUBLE | DOUBLE |
-| AVG | 求平均值。 |
INT32 INT64 FLOAT DOUBLE | DOUBLE |
-| MAX | 求最大值。 | 所有类型
| 与输入类型一致 |
-| MIN | 求最小值。 | 所有类型
| 与输入类型一致 |
-| FIRST | 求时间戳最小且不为 NULL 的值。 | 所有类型
| 与输入类型一致 |
-| LAST | 求时间戳最大且不为 NULL 的值。 | 所有类型
| 与输入类型一致 |
-| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| STDDEV_POP | 求总体标准差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| STDDEV_SAMP | 求样本标准差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| VAR_POP | 求总体方差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| VAR_SAMP | 求样本方差。 | INT32
INT64 FLOAT DOUBLE | DOUBLE |
-| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE
| 与输入类型一致 |
-| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险;
2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL
值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型
| 与输入类型一致 |
-| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x
取最大值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致
|
-| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x
取最小值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致
|
-| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y
可以是任意类型 | 与第一个输入 x 的数据类型一致 |
-| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y
可以是任意类型 | 与第一个输入 x 的数据类型一致 |
+| 函数名 | 功能描述
| 允许的输入类型
| 输出类型 |
+|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------------|
+| COUNT | 计算数据点数。
| 所有类型
| INT64 |
+| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数
| exp 必须是一个布尔类型的表达式,例如 count_if(temperature>20)
| INT64 |
+| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供
COUNT(DISTINCT x) 的近似值,返回不同输入值的近似个数。 |
x:待计算列,支持所有类型;<br> maxStandardError:指定该函数应产生的最大标准误差,取值范围[0.0040625,
0.26],未指定值时默认0.023。 | INT64 |
+| SUM | 求和。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| AVG | 求平均值。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| MAX | 求最大值。
| 所有类型
| 与输入类型一致 |
+| MIN | 求最小值。
| 所有类型
| 与输入类型一致 |
+| FIRST | 求时间戳最小且不为 NULL 的值。
| 所有类型
| 与输入类型一致 |
+| LAST | 求时间戳最大且不为 NULL 的值。
| 所有类型
| 与输入类型一致 |
+| STDDEV | STDDEV_SAMP 的别名,求样本标准差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| STDDEV_POP | 求总体标准差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| STDDEV_SAMP | 求样本标准差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| VARIANCE | VAR_SAMP 的别名,求样本方差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| VAR_POP | 求总体方差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| VAR_SAMP | 求样本方差。
| INT32 INT64 FLOAT DOUBLE
| DOUBLE |
+| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。
| INT32 INT64 FLOAT DOUBLE
| 与输入类型一致 |
+| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险;
2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL
值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型
| 与输入类型一致 |
+| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x
的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
+| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x
的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
+| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
+| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。
| x 和 y 可以是任意类型
| 与第一个输入 x 的数据类型一致 |
### 2.3 示例
@@ -213,7 +214,7 @@ It costs 0.834s
统计 `table2` 中 到达时间 `arrival_time` 不是 `null` 的记录行数。
```sql
-select count_if(arrival_time is not null) from table2;
+IoTDB> select count_if(arrival_time is not null) from table2;
```
执行结果如下:
@@ -228,8 +229,29 @@ Total line number = 1
It costs 0.047s
```
+#### 2.3.4 Approx_count_distinct
-#### 2.3.4 First
+查询 `table1` 中 `temperature` 列不同值的个数。
+
+```sql
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature) as approx FROM table1;
+IoTDB> SELECT COUNT(DISTINCT temperature) as origin,
APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1;
+```
+
+执行结果如下:
+
+```sql
++------+------+
+|origin|approx|
++------+------+
+| 3| 3|
++------+------+
+Total line number = 1
+It costs 0.022s
+```
+
+
+#### 2.3.5 First
查询`temperature`列、`humidity`列时间戳最小且不为 NULL 的值。
@@ -249,7 +271,7 @@ Total line number = 1
It costs 0.170s
```
-#### 2.3.5 Last
+#### 2.3.6 Last
查询`temperature`列、`humidity`列时间戳最大且不为 NULL 的值。
@@ -269,7 +291,7 @@ Total line number = 1
It costs 0.211s
```
-#### 2.3.6 First_by
+#### 2.3.7 First_by
查询 `temperature` 列中非 NULL 且时间戳最小的行的 `time` 值,以及 `temperature` 列中非 NULL
且时间戳最小的行的 `humidity` 值。
@@ -289,7 +311,7 @@ Total line number = 1
It costs 0.269s
```
-#### 2.3.7 Last_by
+#### 2.3.8 Last_by
查询`temperature` 列中非 NULL 且时间戳最大的行的 `time` 值,以及 `temperature` 列中非 NULL
且时间戳最大的行的 `humidity` 值。
@@ -309,7 +331,7 @@ Total line number = 1
It costs 0.070s
```
-#### 2.3.8 Max_by
+#### 2.3.9 Max_by
查询`temperature` 列中最大值所在行的 `time` 值,以及`temperature` 列中最大值所在行的 `humidity` 值。
@@ -329,7 +351,7 @@ Total line number = 1
It costs 0.172s
```
-#### 2.3.9 Min_by
+#### 2.3.10 Min_by
查询`temperature` 列中最小值所在行的 `time` 值,以及`temperature` 列中最小值所在行的 `humidity` 值。
