(iotdb-docs) branch main updated: add function: approx_most_frequent (#800)

critas Thu, 11 Dec 2025 04:00:03 -0800

This is an automated email from the ASF dual-hosted git repository.

critas pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iotdb-docs.git



The following commit(s) were added to refs/heads/main by this push:
     new daeb8fc4 add function: approx_most_frequent (#800)
daeb8fc4 is described below

commit daeb8fc436965690f4e55ad905f46efe30bdecac
Author: leto-b <[email protected]>
AuthorDate: Thu Dec 11 19:59:52 2025 +0800

    add function: approx_most_frequent (#800)
    
    * add function: approx_most_frequent
    
    * add version
---
 .../Master/Table/SQL-Manual/Basis-Function.md      | 79 ++++++++++++++--------
 .../latest-Table/SQL-Manual/Basis-Function.md      | 33 +++++++--
 .../Master/Table/SQL-Manual/Basis-Function.md      | 36 ++++++++--
 .../latest-Table/SQL-Manual/Basis-Function.md      | 35 ++++++++--
 4 files changed, 134 insertions(+), 49 deletions(-)

diff --git a/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md 
b/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
index b50e10dd..d8c96720 100644
--- a/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
+++ b/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
@@ -156,29 +156,30 @@ SELECT LEAST(temperature,humidity) FROM table2;
 
 ### 2.2 Supported Aggregate Functions                            
 
-| Function Name          | Description                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
-| COUNT                  | Counts the number of data points.                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| COUNT_IF               | COUNT_IF(exp) counts the number of rows that 
satisfy a specified boolean expression.                                         
                                                                                
                                                                                
                                                                                
                                                                                
                     [...]
-| APPROX_COUNT_DISTINCT  | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) 
function provides an approximation of COUNT(DISTINCT x), returning the 
estimated number of distinct input values.                                      
                                                                                
                                                                                
                                                                                
                          [...]
-| SUM                    | Calculates the sum.                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| AVG                    | Calculates the average.                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| MAX                    | Finds the maximum value.                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| MIN                    | Finds the minimum value.                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| FIRST                  | Finds the value with the smallest timestamp that is 
not NULL.                                                                       
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| LAST                   | Finds the value with the largest timestamp that is 
not NULL.                                                                       
                                                                                
                                                                                
                                                                                
                                                                                
               [...]
-| STDDEV                 | Alias for STDDEV_SAMP,  calculates the sample 
standard deviation.                                                             
                                                                                
                                                                                
                                                                                
                                                                                
                    [...]
-| STDDEV_POP             | Calculates the population standard deviation.       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| STDDEV_SAMP            | Calculates the sample standard deviation.           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| VARIANCE               | Alias for VAR_SAMP,  calculates the sample 
variance.                                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                       [...]
-| VAR_POP                | Calculates the population variance.                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| VAR_SAMP               | Calculates the sample variance.                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| EXTREME                | Finds the value with the largest absolute value. If 
the largest absolute values of positive and negative values are equal, returns 
the positive value.                                                             
                                                                                
                                                                                
                                                                                
               [...]
-| MODE                   | Finds the mode. Note: 1. There is a risk of memory 
exception when the number of distinct values in the input sequence is too 
large; 2. If all elements have the same frequency, i.e., there is no mode, a 
random element is returned; 3. If there are multiple modes, a random mode is 
returned; 4. NULL values are also counted in frequency, so even if not all 
values in the input sequence are NULL, the final result may still be NULL.      
                                [...]
-| MAX_BY                 | MAX_BY(x, y) finds the value of x corresponding to 
the maximum y in the binary input x and y. MAX_BY(time, x) returns the 
timestamp when x is at its maximum.                                             
                                                                                
                                                                                
                                                                                
                        [...]
-| MIN_BY                 | MIN_BY(x, y) finds the value of x corresponding to 
the minimum y in the binary input x and y. MIN_BY(time, x) returns the 
timestamp when x is at its minimum.                                             
                                                                                
                                                                                
                                                                                
                        [...]
-| FIRST_BY               | FIRST_BY(x, y) finds the value of x in the same row 
when y is the first non-null value.                                             
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| LAST_BY                | LAST_BY(x, y) finds the value of x in the same row 
when y is the last non-null value.                                              
                                                                                
                                                                                
                                                                                
                                                                                
               [...]
+| Function Name          | Description                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | Allowed Input Types  
              [...]
+|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------
 [...]
+| COUNT                  | Counts the number of data points.                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | All types            
              [...]
+| COUNT_IF               | COUNT_IF(exp) counts the number of rows that 
satisfy a specified boolean expression.                                         
                                                                                
                                                                                
                                                                                
                                                                | `exp` must be 
a boolean expression [...]
+| APPROX_COUNT_DISTINCT  | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) 
function provides an approximation of COUNT(DISTINCT x), returning the 
estimated number of distinct input values.                                      
                                                                                
                                                                                
                                                                     | `x`: The 
target column to be calcu [...]
+| APPROX_MOST_FREQUENT | The APPROX_MOST_FREQUENT(x, k, capacity) function is 
used to approximately calculate the top k most frequent elements in a dataset. 
It returns a JSON-formatted string where the keys are the element values and 
the values are their corresponding approximate frequencies. （Available since 
V2.0.5.1)                                                                       
                                                               | `x` : The 
column to be calculated, s [...]
+| SUM                    | Calculates the sum.                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | INT32 INT64 FLOAT 
DOUBLE           [...]
+| AVG                    | Calculates the average.                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | INT32 INT64 FLOAT 
DOUBLE           [...]
+| MAX                    | Finds the maximum value.                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | All types            
              [...]
+| MIN                    | Finds the minimum value.                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | All types            
              [...]
+| FIRST                  | Finds the value with the smallest timestamp that is 
not NULL.                                                                       
                                                                                
                                                                                
                                                                                
                                                         | All types            
              [...]
+| LAST                   | Finds the value with the largest timestamp that is 
not NULL.                                                                       
                                                                                
                                                                                
                                                                                
                                                          | All types           
               [...]
+| STDDEV                 | Alias for STDDEV_SAMP,  calculates the sample 
standard deviation.                                                             
                                                                                
                                                                                
                                                                                
                                                               | INT32 INT64 
FLOAT DOUBLE           [...]
+| STDDEV_POP             | Calculates the population standard deviation.       
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | INT32 INT64 FLOAT 
DOUBLE           [...]
+| STDDEV_SAMP            | Calculates the sample standard deviation.           
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | INT32 INT64 FLOAT 
DOUBLE           [...]
+| VARIANCE               | Alias for VAR_SAMP,  calculates the sample 
variance.                                                                       
                                                                                
                                                                                
                                                                                
                                                                  | INT32 INT64 
FLOAT DOUBLE           [...]
+| VAR_POP                | Calculates the population variance.                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | INT32 INT64 FLOAT 
DOUBLE           [...]
+| VAR_SAMP               | Calculates the sample variance.                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                         | INT32 INT64 FLOAT 
DOUBLE           [...]
+| EXTREME                | Finds the value with the largest absolute value. If 
the largest absolute values of positive and negative values are equal, returns 
the positive value.                                                             
                                                                                
                                                                                
                                                          | INT32 INT64 FLOAT 
DOUBLE           [...]
+| MODE                   | Finds the mode. Note: 1. There is a risk of memory 
exception when the number of distinct values in the input sequence is too 
large; 2. If all elements have the same frequency, i.e., there is no mode, a 
random element is returned; 3. If there are multiple modes, a random mode is 
returned; 4. NULL values are also counted in frequency, so even if not all 
values in the input sequence are NULL, the final result may still be NULL. | 
All types                          [...]
+| MAX_BY                 | MAX_BY(x, y) finds the value of x corresponding to 
the maximum y in the binary input x and y. MAX_BY(time, x) returns the 
timestamp when x is at its maximum.                                             
                                                                                
                                                                                
                                                                   | x and y 
can be of any type         [...]
+| MIN_BY                 | MIN_BY(x, y) finds the value of x corresponding to 
the minimum y in the binary input x and y. MIN_BY(time, x) returns the 
timestamp when x is at its minimum.                                             
                                                                                
                                                                                
                                                                   | x and y 
can be of any type         [...]
+| FIRST_BY               | FIRST_BY(x, y) finds the value of x in the same row 
when y is the first non-null value.                                             
                                                                                
                                                                                
                                                                                
                                                         | x and y can be of 
any type         [...]
+| LAST_BY                | LAST_BY(x, y) finds the value of x in the same row 
when y is the last non-null value.                                              
                                                                                
                                                                                
                                                                                
                                                          | x and y can be of 
any type         [...]
 
 
 ### 2.3 Examples
@@ -251,8 +252,28 @@ Total line number = 1
 It costs 0.022s
 ```
 
+#### 2.3.5 Approx_most_frequent
 
-#### 2.3.5 First
+Query the top 2 most frequent values in the `temperature` column of 
`table1`.
+
+```sql
+IoTDB> select approx_most_frequent(temperature,2,100) as topk from table1;
+```
+
+The execution result is as follows:
+
+```sql
++-------------------+
+|               topk|
++-------------------+
+|{"85.0":6,"90.0":5}|
++-------------------+
+Total line number = 1
+It costs 0.064s
+```
+
+
+#### 2.3.6 First
 
 Finds the values with the smallest timestamp that are not NULL in the 
`temperature` and `humidity` columns.
 
@@ -272,7 +293,7 @@ Total line number = 1
 It costs 0.170s
 ```
 
-#### 2.3.6 Last
+#### 2.3.7 Last
 
 Finds the values with the largest timestamp that are not NULL in the 
`temperature` and `humidity` columns.
 
@@ -292,7 +313,7 @@ Total line number = 1
 It costs 0.211s
 ```
 
-#### 2.3.7 First_by
+#### 2.3.8 First_by
 
 Finds the `time` value of the row with the smallest timestamp that is not NULL 
in the `temperature` column, and the `humidity` value of the row with the 
smallest timestamp that is not NULL in the `temperature` column.
 
@@ -312,7 +333,7 @@ Total line number = 1
 It costs 0.269s
 ```
 
-#### 2.3.8 Last_by
+#### 2.3.9 Last_by
 
 Queries the `time` value of the row with the largest timestamp that is not 
NULL in the `temperature` column, and the `humidity` value of the row with the 
largest timestamp that is not NULL in the `temperature` column.
 
@@ -332,7 +353,7 @@ Total line number = 1
 It costs 0.070s
 ```
 
-#### 2.3.9 Max_by
+#### 2.3.10 Max_by
 
 Queries the `time` value of the row where the `temperature` column is at its 
maximum, and the `humidity` value of the row where the `temperature` column is 
at its maximum.
 
@@ -352,7 +373,7 @@ Total line number = 1
 It costs 0.172s
 ```
 
-#### 2.3.10 Min_by
+#### 2.3.11 Min_by
 
 Queries the `time` value of the row where the `temperature` column is at its 
minimum, and the `humidity` value of the row where the `temperature` column is 
at its minimum.
 
diff --git a/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md 
b/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
index b50e10dd..65ba2014 100644
--- a/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
+++ b/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
@@ -161,6 +161,7 @@ SELECT LEAST(temperature,humidity) FROM table2;
 | COUNT                  | Counts the number of data points.                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
 | COUNT_IF               | COUNT_IF(exp) counts the number of rows that 
satisfy a specified boolean expression.                                         
                                                                                
                                                                                
                                                                                
                                                                                
                     [...]
 | APPROX_COUNT_DISTINCT  | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) 
function provides an approximation of COUNT(DISTINCT x), returning the 
estimated number of distinct input values.                                      
                                                                                
                                                                                
                                                                                
                          [...]
+| APPROX_MOST_FREQUENT | The APPROX_MOST_FREQUENT(x, k, capacity) function is 
used to approximately calculate the top k most frequent elements in a dataset. 
It returns a JSON-formatted string where the keys are the element values and 
the values are their corresponding approximate frequencies. （Available since 
V2.0.5.1) | `x` : The column to be calculated, supporting all existing data 
types in IoTDB;<br> `k`: The number of top-k most frequent values to 
return;<br>`capacity`: The number of [...]
 | SUM                    | Calculates the sum.                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
 | AVG                    | Calculates the average.                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
 | MAX                    | Finds the maximum value.                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
@@ -251,8 +252,28 @@ Total line number = 1
 It costs 0.022s
 ```
 
+#### 2.3.5 Approx_most_frequent
 
-#### 2.3.5 First
+Query the top 2 most frequent values in the `temperature` column of 
`table1`.
+
+```sql
+IoTDB> select approx_most_frequent(temperature,2,100) as topk from table1;
+```
+
+The execution result is as follows:
+
+```sql
++-------------------+
+|               topk|
++-------------------+
+|{"85.0":6,"90.0":5}|
++-------------------+
+Total line number = 1
+It costs 0.064s
+```
+
+
+#### 2.3.6 First
 
 Finds the values with the smallest timestamp that are not NULL in the 
`temperature` and `humidity` columns.
 
@@ -272,7 +293,7 @@ Total line number = 1
 It costs 0.170s
 ```
 
-#### 2.3.6 Last
+#### 2.3.7 Last
 
 Finds the values with the largest timestamp that are not NULL in the 
`temperature` and `humidity` columns.
 
@@ -292,7 +313,7 @@ Total line number = 1
 It costs 0.211s
 ```
 
-#### 2.3.7 First_by
+#### 2.3.8 First_by
 
 Finds the `time` value of the row with the smallest timestamp that is not NULL 
in the `temperature` column, and the `humidity` value of the row with the 
smallest timestamp that is not NULL in the `temperature` column.
 
@@ -312,7 +333,7 @@ Total line number = 1
 It costs 0.269s
 ```
 
-#### 2.3.8 Last_by
+#### 2.3.9 Last_by
 
 Queries the `time` value of the row with the largest timestamp that is not 
NULL in the `temperature` column, and the `humidity` value of the row with the 
largest timestamp that is not NULL in the `temperature` column.
 
@@ -332,7 +353,7 @@ Total line number = 1
 It costs 0.070s
 ```
 
-#### 2.3.9 Max_by
+#### 2.3.10 Max_by
 
 Queries the `time` value of the row where the `temperature` column is at its 
maximum, and the `humidity` value of the row where the `temperature` column is 
at its maximum.
 
@@ -352,7 +373,7 @@ Total line number = 1
 It costs 0.172s
 ```
 
-#### 2.3.10 Min_by
+#### 2.3.11 Min_by
 
 Queries the `time` value of the row where the `temperature` column is at its 
minimum, and the `humidity` value of the row where the `temperature` column is 
at its minimum.
 
diff --git a/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md 
b/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
index c55ba021..6ed554c8 100644
--- a/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
+++ b/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md
@@ -159,7 +159,8 @@ SELECT LEAST(temperature,humidity) FROM table2;
 
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------------|
 | COUNT                 | 计算数据点数。                                              
                                                                                
    | 所有类型                                                                      
              | INT64            |
 | COUNT_IF              | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数                     
                                                                                
    | exp 必须是一个布尔类型的表达式，例如 count_if(temperature>20)                             
              | INT64            |
-| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供 
COUNT(DISTINCT x) 的近似值，返回不同输入值的近似个数。                                 | 
x：待计算列，支持所有类型；<br> maxStandardError：指定该函数应产生的最大标准误差，取值范围[0.0040625, 
0.26]，未指定值时默认0.023。 | INT64            |
+| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供 
COUNT(DISTINCT x) 的近似值，返回不同输入值的近似个数。                                            
        | `x`：待计算列，支持所有类型；<br> 
`maxStandardError`：指定该函数应产生的最大标准误差，取值范围[0.0040625, 0.26]，未指定值时默认0.023。 | INT64  
          |
+| APPROX_MOST_FREQUENT | APPROX_MOST_FREQUENT(x, k, capacity) 
函数用于近似计算数据集中出现频率最高的前 k 个元素。它返回一个JSON 格式的字符串，其中键是该元素的值，值是该元素对应的近似频率。（V 2.0.5.1 
及以后版本支持）              | `x`：待计算列，支持 IoTDB 现有所有的数据类型；<br> `k`：返回出现频率最高的 k 
个值；<br> `capacity`: 
用于计算的桶的数量，跟内存占用相关：其值越大误差越小，但占用内存更大，反之capacity值越小误差越大，但占用内存更小。 | STRING   |
 | SUM                   | 求和。                                                  
                                                                                
    | INT32 INT64 FLOAT DOUBLE                                                  
              | DOUBLE           |
 | AVG                   | 求平均值。                                                
                                                                                
    | INT32 INT64 FLOAT DOUBLE                                                  
              | DOUBLE           |
 | MAX                   | 求最大值。                                                
                                                                                
    | 所有类型                                                                      
              | 与输入类型一致          |
@@ -251,7 +252,28 @@ It costs 0.022s
 ```
 
 
-#### 2.3.5 First
+#### 2.3.5 Approx_most_frequent
+
+查询 `table1` 中 `temperature` 列出现频次最高的2个值
+
+```sql
+IoTDB> select approx_most_frequent(temperature,2,100) as topk from table1;
+```
+
+执行结果如下：
+
+```sql
++-------------------+
+|               topk|
++-------------------+
+|{"85.0":6,"90.0":5}|
++-------------------+
+Total line number = 1
+It costs 0.064s
+```
+
+
+#### 2.3.6 First
 
 查询`temperature`列、`humidity`列时间戳最小且不为 NULL 的值。
 
@@ -271,7 +293,7 @@ Total line number = 1
 It costs 0.170s
 ```
 
-#### 2.3.6 Last
+#### 2.3.7 Last
 
 查询`temperature`列、`humidity`列时间戳最大且不为 NULL 的值。
 
@@ -291,7 +313,7 @@ Total line number = 1
 It costs 0.211s
 ```
 
-#### 2.3.7 First_by
+#### 2.3.8 First_by
 
 查询 `temperature` 列中非 NULL 且时间戳最小的行的 `time` 值，以及 `temperature` 列中非 NULL 
且时间戳最小的行的 `humidity` 值。
 
@@ -311,7 +333,7 @@ Total line number = 1
 It costs 0.269s
 ```
 
-#### 2.3.8 Last_by
+#### 2.3.9 Last_by
 
 查询`temperature` 列中非 NULL 且时间戳最大的行的 `time` 值，以及 `temperature` 列中非 NULL 
且时间戳最大的行的 `humidity` 值。
 
@@ -331,7 +353,7 @@ Total line number = 1
 It costs 0.070s
 ```
 
-#### 2.3.9 Max_by
+#### 2.3.10 Max_by
 
 查询`temperature` 列中最大值所在行的 `time` 值，以及`temperature` 列中最大值所在行的 `humidity` 值。
 
@@ -351,7 +373,7 @@ Total line number = 1
 It costs 0.172s
 ```
 
-#### 2.3.10 Min_by
+#### 2.3.11 Min_by
 
 查询`temperature` 列中最小值所在行的 `time` 值，以及`temperature` 列中最小值所在行的 `humidity` 值。
 
diff --git a/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md 
b/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
index c55ba021..219d6820 100644
--- a/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
+++ b/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md
@@ -159,7 +159,8 @@ SELECT LEAST(temperature,humidity) FROM table2;
 
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------------|
 | COUNT                 | 计算数据点数。                                              
                                                                                
    | 所有类型                                                                      
              | INT64            |
 | COUNT_IF              | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数                     
                                                                                
    | exp 必须是一个布尔类型的表达式，例如 count_if(temperature>20)                             
              | INT64            |
-| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供 
COUNT(DISTINCT x) 的近似值，返回不同输入值的近似个数。                                 | 
x：待计算列，支持所有类型；<br> maxStandardError：指定该函数应产生的最大标准误差，取值范围[0.0040625, 
0.26]，未指定值时默认0.023。 | INT64            |
+| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供 
COUNT(DISTINCT x) 的近似值，返回不同输入值的近似个数。                                 | 
`x`：待计算列，支持所有类型；<br> `maxStandardError`：指定该函数应产生的最大标准误差，取值范围[0.0040625, 
0.26]，未指定值时默认0.023。 | INT64            |
+| APPROX_MOST_FREQUENT | APPROX_MOST_FREQUENT(x, k, capacity) 
函数用于近似计算数据集中出现频率最高的前 k 个元素。它返回一个JSON 格式的字符串，其中键是该元素的值，值是该元素对应的近似频率。（V 2.0.5.1 
及以后版本支持）  | `x`：待计算列，支持 IoTDB 现有所有的数据类型；<br> `k`：返回出现频率最高的 k 个值；<br> 
`capacity`: 用于计算的桶的数量，跟内存占用相关：其值越大误差越小，但占用内存更大，反之capacity值越小误差越大，但占用内存更小。 | 
STRING   |
 | SUM                   | 求和。                                                  
                                                                                
    | INT32 INT64 FLOAT DOUBLE                                                  
              | DOUBLE           |
 | AVG                   | 求平均值。                                                
                                                                                
    | INT32 INT64 FLOAT DOUBLE                                                  
              | DOUBLE           |
 | MAX                   | 求最大值。                                                
                                                                                
    | 所有类型                                                                      
              | 与输入类型一致          |
@@ -250,8 +251,28 @@ Total line number = 1
 It costs 0.022s
 ```
 
+#### 2.3.5 Approx_most_frequent
 
-#### 2.3.5 First
+查询 `table1` 中 `temperature` 列出现频次最高的2个值
+
+```sql
+IoTDB> select approx_most_frequent(temperature,2,100) as topk from table1;
+```
+
+执行结果如下：
+
+```sql
++-------------------+
+|               topk|
++-------------------+
+|{"85.0":6,"90.0":5}|
++-------------------+
+Total line number = 1
+It costs 0.064s
+```
+
+
+#### 2.3.6 First
 
 查询`temperature`列、`humidity`列时间戳最小且不为 NULL 的值。
 
@@ -271,7 +292,7 @@ Total line number = 1
 It costs 0.170s
 ```
 
-#### 2.3.6 Last
+#### 2.3.7 Last
 
 查询`temperature`列、`humidity`列时间戳最大且不为 NULL 的值。
 
@@ -291,7 +312,7 @@ Total line number = 1
 It costs 0.211s
 ```
 
-#### 2.3.7 First_by
+#### 2.3.8 First_by
 
 查询 `temperature` 列中非 NULL 且时间戳最小的行的 `time` 值，以及 `temperature` 列中非 NULL 
且时间戳最小的行的 `humidity` 值。
 
@@ -311,7 +332,7 @@ Total line number = 1
 It costs 0.269s
 ```
 
-#### 2.3.8 Last_by
+#### 2.3.9 Last_by
 
 查询`temperature` 列中非 NULL 且时间戳最大的行的 `time` 值，以及 `temperature` 列中非 NULL 
且时间戳最大的行的 `humidity` 值。
 
@@ -331,7 +352,7 @@ Total line number = 1
 It costs 0.070s
 ```
 
-#### 2.3.9 Max_by
+#### 2.3.10 Max_by
 
 查询`temperature` 列中最大值所在行的 `time` 值，以及`temperature` 列中最大值所在行的 `humidity` 值。
 
@@ -351,7 +372,7 @@ Total line number = 1
 It costs 0.172s
 ```
 
-#### 2.3.10 Min_by
+#### 2.3.11 Min_by
 
 查询`temperature` 列中最小值所在行的 `time` 值，以及`temperature` 列中最小值所在行的 `humidity` 值。

(iotdb-docs) branch main updated: add function: approx_most_frequent (#800)

Reply via email to