(iotdb-docs) branch main updated: Update Cluster UDF in UDF-Libraries (#1072)

jackietien Thu, 07 May 2026 02:35:54 -0700

This is an automated email from the ASF dual-hosted git repository.

JackieTien97 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iotdb-docs.git



The following commit(s) were added to refs/heads/main by this push:
     new 28869f96 Update Cluster UDF in UDF-Libraries (#1072)
28869f96 is described below

commit 28869f96944e11f0a09e10cee280e5283f94a6a8
Author: Yunxiang Su <[email protected]>
AuthorDate: Thu May 7 17:35:41 2026 +0800

    Update Cluster UDF in UDF-Libraries (#1072)
---
 .../V1.3.x/SQL-Manual/UDF-Libraries_apache.md      | 86 +++++++++++++++++++++
 .../V1.3.x/SQL-Manual/UDF-Libraries_timecho.md     | 86 +++++++++++++++++++++
 .../latest/SQL-Manual/UDF-Libraries_apache.md      | 86 +++++++++++++++++++++
 .../latest/SQL-Manual/UDF-Libraries_timecho.md     | 86 +++++++++++++++++++++
 .../V1.3.x/SQL-Manual/UDF-Libraries_apache.md      | 86 +++++++++++++++++++++
 .../V1.3.x/SQL-Manual/UDF-Libraries_timecho.md     | 86 +++++++++++++++++++++
 .../latest/SQL-Manual/UDF-Libraries_apache.md      | 88 ++++++++++++++++++++++
 .../latest/SQL-Manual/UDF-Libraries_timecho.md     | 86 +++++++++++++++++++++
 8 files changed, 690 insertions(+)

diff --git a/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md 
b/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md
index 5beb7791..7c93cbdb 100644
--- a/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md
+++ b/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md
@@ -4917,3 +4917,89 @@ Output Series:
 +-----------------------------+---------------------------+
 ```
 
+### Cluster
+
+#### Registration statement
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### Usage
+
+This function takes a **single input time series**, splits it into 
**non-overlapping** contiguous subsequences (windows) of fixed length `l`, and 
clusters those subsequences into `k` groups.
+
+**Name:** Cluster
+
+**Input Series:** Only support single input numeric series. The type is INT32 
/ INT64 / FLOAT / DOUBLE. Points are read in time order; trailing samples that 
do not fill a full window are dropped (only `⌊n/l⌋` windows are used, where `n` 
is the number of valid points).
+
+**Parameters:**
+
+| Name | Meaning | Default | Notes |
+|------|---------|---------|--------|
+| `l` | Subsequence (window) length | (required) | Positive integer; each 
window has `l` consecutive samples. |
+| `k` | Number of clusters | (required) | Integer ≥ 2. |
+| `method` | Clustering algorithm | `kmeans` | Optional: `kmeans`, `kshape`, 
`medoidshape` (case-insensitive). Defaults to k-means if omitted. |
+| `norm` | Z-score normalize each subsequence | `true` | Boolean; if `true`, 
each subsequence is standardized before clustering. |
+| `maxiter` | Maximum iterations | `200` | Positive integer. |
+| `output` | Output mode | `label` | `label`: one cluster id per window; 
`centroid`: concatenate the `k` centroid vectors in cluster order. |
+| `sample_rate` | Greedy sampling rate | `0.3` | Used only when **`method` = 
`medoidshape`**; must be in `(0, 1]`. |
+
+
+**`method` details:**
+
+- **kmeans**: k-means in Euclidean space (optionally after per-window 
normalization).
+- **kshape**: Assign by shape-based distance (SBD from normalized 
cross-correlation, NCC); centroids updated via SVD on the cluster matrix.
+- **medoidshape**: Coarsely cluster, then greedy selection of `k` 
representative subsequences; `sample_rate` controls how many candidates are 
sampled each round.
+
+**Output Series:** Controlled by `output`:
+
+- **`output` = `label` (default):** One output series, type **INT32**. Number 
of points = number of full windows, `⌊n/l⌋`. Timestamp of each point = **time 
of the first sample** in that window; value = cluster id **0 … k−1**.
+- **`output` = `centroid`:** One output series, type **DOUBLE**. Number of 
points = **`k × l`**: for clusters **0 → k−1**, emit the `l` components of each 
centroid in order (concatenated). Timestamps are `0, 1, 2, …` (placeholders 
only, no physical time meaning).
+
+**Note:**
+
+- Require valid point count `n ≥ l` and window count `⌊n/l⌋ ≥ k`.
+
+#### Examples
+
+##### KShape: window length 3, k = 2
+
+Nine samples `{1,2,3,10,20,30,1,5,1}` form three non-overlapping windows 
`{1,2,3}`, `{10,20,30}`, `{1,5,1}`. With **`method` = `kshape`** (default 
`norm` = `true`), each output row is the cluster id for one window; timestamps 
are the window start times. Resulting labels: **0, 0, 1**.
+
+Input Series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+Output Series:
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```
diff --git a/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md 
b/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md
index 84541c7f..2f719fe1 100644
--- a/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md
+++ b/src/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md
@@ -4976,3 +4976,89 @@ Output Series:
 ```
 
 
+### Cluster
+
+#### Registration statement
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### Usage
+
+This function takes a **single input time series**, splits it into 
**non-overlapping** contiguous subsequences (windows) of fixed length `l`, and 
clusters those subsequences into `k` groups.
+
+**Name:** Cluster
+
+**Input Series:** Only support single input numeric series. The type is INT32 
/ INT64 / FLOAT / DOUBLE. Points are read in time order; trailing samples that 
do not fill a full window are dropped (only `⌊n/l⌋` windows are used, where `n` 
is the number of valid points).
+
+**Parameters:**
+
+| Name | Meaning | Default | Notes |
+|------|---------|---------|--------|
+| `l` | Subsequence (window) length | (required) | Positive integer; each 
window has `l` consecutive samples. |
+| `k` | Number of clusters | (required) | Integer ≥ 2. |
+| `method` | Clustering algorithm | `kmeans` | Optional: `kmeans`, `kshape`, 
`medoidshape` (case-insensitive). Defaults to k-means if omitted. |
+| `norm` | Z-score normalize each subsequence | `true` | Boolean; if `true`, 
each subsequence is standardized before clustering. |
+| `maxiter` | Maximum iterations | `200` | Positive integer. |
+| `output` | Output mode | `label` | `label`: one cluster id per window; 
`centroid`: concatenate the `k` centroid vectors in cluster order. |
+| `sample_rate` | Greedy sampling rate | `0.3` | Used only when **`method` = 
`medoidshape`**; must be in `(0, 1]`. |
+
+
+**`method` details:**
+
+- **kmeans**: k-means in Euclidean space (optionally after per-window 
normalization).
+- **kshape**: Assign by shape-based distance (SBD from normalized 
cross-correlation, NCC); centroids updated via SVD on the cluster matrix.
+- **medoidshape**: Coarsely cluster, then greedy selection of `k` 
representative subsequences; `sample_rate` controls how many candidates are 
sampled each round.
+
+**Output Series:** Controlled by `output`:
+
+- **`output` = `label` (default):** One output series, type **INT32**. Number 
of points = number of full windows, `⌊n/l⌋`. Timestamp of each point = **time 
of the first sample** in that window; value = cluster id **0 … k−1**.
+- **`output` = `centroid`:** One output series, type **DOUBLE**. Number of 
points = **`k × l`**: for clusters **0 → k−1**, emit the `l` components of each 
centroid in order (concatenated). Timestamps are `0, 1, 2, …` (placeholders 
only, no physical time meaning).
+
+**Note:**
+
+- Require valid point count `n ≥ l` and window count `⌊n/l⌋ ≥ k`.
+
+#### Examples
+
+##### KShape: window length 3, k = 2
+
+Nine samples `{1,2,3,10,20,30,1,5,1}` form three non-overlapping windows 
`{1,2,3}`, `{10,20,30}`, `{1,5,1}`. With **`method` = `kshape`** (default 
`norm` = `true`), each output row is the cluster id for one window; timestamps 
are the window start times. Resulting labels: **0, 0, 1**.
+
+Input Series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+Output Series:
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```
\ No newline at end of file
diff --git a/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md 
b/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md
index 617026aa..806a1375 100644
--- a/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md
+++ b/src/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md
@@ -4894,3 +4894,89 @@ Output Series:
 |1970-01-01T08:00:00.002+08:00|                    -0.2571|
 +-----------------------------+---------------------------+
 ```
+
+### 9.2 Cluster
+
+#### Registration statement
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### Usage
+
+This function takes a **single input time series**, splits it into 
**non-overlapping** contiguous subsequences (windows) of fixed length `l`, and 
clusters those subsequences into `k` groups.
+
+**Name:** Cluster
+
+**Input Series:** Only support single input numeric series. The type is INT32 
/ INT64 / FLOAT / DOUBLE. Points are read in time order; trailing samples that 
do not fill a full window are dropped (only `⌊n/l⌋` windows are used, where `n` 
is the number of valid points).
+
+**Parameters:**
+
+| Name | Meaning | Default | Notes |
+|------|---------|---------|--------|
+| `l` | Subsequence (window) length | (required) | Positive integer; each 
window has `l` consecutive samples. |
+| `k` | Number of clusters | (required) | Integer ≥ 2. |
+| `method` | Clustering algorithm | `kmeans` | Optional: `kmeans`, `kshape`, 
`medoidshape` (case-insensitive). Defaults to k-means if omitted. |
+| `norm` | Z-score normalize each subsequence | `true` | Boolean; if `true`, 
each subsequence is standardized before clustering. |
+| `maxiter` | Maximum iterations | `200` | Positive integer. |
+| `output` | Output mode | `label` | `label`: one cluster id per window; 
`centroid`: concatenate the `k` centroid vectors in cluster order. |
+| `sample_rate` | Greedy sampling rate | `0.3` | Used only when **`method` = 
`medoidshape`**; must be in `(0, 1]`. |
+
+
+**`method` details:**
+
+- **kmeans**: k-means in Euclidean space (optionally after per-window 
normalization).
+- **kshape**: Assign by shape-based distance (SBD from normalized 
cross-correlation, NCC); centroids updated via SVD on the cluster matrix.
+- **medoidshape**: Coarsely cluster, then greedy selection of `k` 
representative subsequences; `sample_rate` controls how many candidates are 
sampled each round.
+
+**Output Series:** Controlled by `output`:
+
+- **`output` = `label` (default):** One output series, type **INT32**. Number 
of points = number of full windows, `⌊n/l⌋`. Timestamp of each point = **time 
of the first sample** in that window; value = cluster id **0 … k−1**.
+- **`output` = `centroid`:** One output series, type **DOUBLE**. Number of 
points = **`k × l`**: for clusters **0 → k−1**, emit the `l` components of each 
centroid in order (concatenated). Timestamps are `0, 1, 2, …` (placeholders 
only, no physical time meaning).
+
+**Note:**
+
+- Require valid point count `n ≥ l` and window count `⌊n/l⌋ ≥ k`.
+
+#### Examples
+
+##### KShape: window length 3, k = 2
+
+Nine samples `{1,2,3,10,20,30,1,5,1}` form three non-overlapping windows 
`{1,2,3}`, `{10,20,30}`, `{1,5,1}`. With **`method` = `kshape`** (default 
`norm` = `true`), each output row is the cluster id for one window; timestamps 
are the window start times. Resulting labels: **0, 0, 1**.
+
+Input Series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+Output Series:
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
diff --git a/src/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md 
b/src/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md
index c2e76b94..b4f54e05 100644
--- a/src/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md
+++ b/src/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md
@@ -4975,3 +4975,89 @@ Output Series:
 +-----------------------------+---------------------------+
 ```
 
+### 9.2 Cluster
+
+#### Registration statement
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### Usage
+
+This function takes a **single input time series**, splits it into 
**non-overlapping** contiguous subsequences (windows) of fixed length `l`, and 
clusters those subsequences into `k` groups.
+
+**Name:** Cluster
+
+**Input Series:** Only support single input numeric series. The type is INT32 
/ INT64 / FLOAT / DOUBLE. Points are read in time order; trailing samples that 
do not fill a full window are dropped (only `⌊n/l⌋` windows are used, where `n` 
is the number of valid points).
+
+**Parameters:**
+
+| Name | Meaning | Default | Notes |
+|------|---------|---------|--------|
+| `l` | Subsequence (window) length | (required) | Positive integer; each 
window has `l` consecutive samples. |
+| `k` | Number of clusters | (required) | Integer ≥ 2. |
+| `method` | Clustering algorithm | `kmeans` | Optional: `kmeans`, `kshape`, 
`medoidshape` (case-insensitive). Defaults to k-means if omitted. |
+| `norm` | Z-score normalize each subsequence | `true` | Boolean; if `true`, 
each subsequence is standardized before clustering. |
+| `maxiter` | Maximum iterations | `200` | Positive integer. |
+| `output` | Output mode | `label` | `label`: one cluster id per window; 
`centroid`: concatenate the `k` centroid vectors in cluster order. |
+| `sample_rate` | Greedy sampling rate | `0.3` | Used only when **`method` = 
`medoidshape`**; must be in `(0, 1]`. |
+
+
+**`method` details:**
+
+- **kmeans**: k-means in Euclidean space (optionally after per-window 
normalization).
+- **kshape**: Assign by shape-based distance (SBD from normalized 
cross-correlation, NCC); centroids updated via SVD on the cluster matrix.
+- **medoidshape**: Coarsely cluster, then greedy selection of `k` 
representative subsequences; `sample_rate` controls how many candidates are 
sampled each round.
+
+**Output Series:** Controlled by `output`:
+
+- **`output` = `label` (default):** One output series, type **INT32**. Number 
of points = number of full windows, `⌊n/l⌋`. Timestamp of each point = **time 
of the first sample** in that window; value = cluster id **0 … k−1**.
+- **`output` = `centroid`:** One output series, type **DOUBLE**. Number of 
points = **`k × l`**: for clusters **0 → k−1**, emit the `l` components of each 
centroid in order (concatenated). Timestamps are `0, 1, 2, …` (placeholders 
only, no physical time meaning).
+
+**Note:**
+
+- Require valid point count `n ≥ l` and window count `⌊n/l⌋ ≥ k`.
+
+#### Examples
+
+##### KShape: window length 3, k = 2
+
+Nine samples `{1,2,3,10,20,30,1,5,1}` form three non-overlapping windows 
`{1,2,3}`, `{10,20,30}`, `{1,5,1}`. With **`method` = `kshape`** (default 
`norm` = `true`), each output row is the cluster id for one window; timestamps 
are the window start times. Resulting labels: **0, 0, 1**.
+
+Input Series:
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+SQL for query:
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+Output Series:
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```
diff --git a/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md 
b/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md
index 80cc6be4..a207b74f 100644
--- a/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md
+++ b/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_apache.md
@@ -5016,3 +5016,89 @@ select ar(s0,"p"="2") from root.test.d0
 |1970-01-01T08:00:00.002+08:00|                    -0.2571|
 +-----------------------------+---------------------------+
 ```
+
+### Cluster
+
+#### 注册语句
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### 函数简介
+
+本函数对**单条输入时间序列**，按固定长度 `l` 切分为**互不重叠**的连续子序列（窗口），再对这些子序列聚类，得到 `k` 个分组。
+
+**函数名:**  Cluster
+
+**输入序列:** 仅支持单条数值型时间序列，类型为 INT32 / INT64 / FLOAT / 
DOUBLE。点按时间顺序读取；末尾不足以凑满一整窗的采样会被**丢弃**（仅使用 `⌊n/l⌋` 个窗口，`n` 为有效点数）。
+
+**参数:**
+
+| 名称 | 含义 | 默认值 | 说明 |
+|------|------|--------|------|
+| `l` | 子序列（窗口）长度 | （必填） | 正整数；每个窗口含连续 `l` 个采样。 |
+| `k` | 聚类个数 | （必填） | 整数 ≥ 2。 |
+| `method` | 聚类算法 | `kmeans` | 
可选：`kmeans`、`kshape`、`medoidshape`（大小写不敏感）。省略时默认为 k-means。 |
+| `norm` | 是否对每个子序列做 Z-score 标准化 | `true` | 布尔；为 `true` 时在聚类前对每个子序列标准化。 |
+| `maxiter` | 最大迭代次数 | `200` | 正整数。 |
+| `output` | 输出模式 | `label` | `label`：每个窗口一个簇编号；`centroid`：按簇顺序拼接 `k` 个质心向量。 |
+| `sample_rate` | 贪心采样比例 | `0.3` | 仅在 **`method` = `medoidshape`** 时使用；取值须在 
`(0, 1]`。 |
+
+**`method` 说明:**
+
+- **kmeans**：欧氏空间中的 k-means（可选是否先做逐窗归一化）。
+- **kshape**：基于形状距离（由归一化互相关 NCC 得到的 SBD）分配簇；质心通过簇矩阵的 **SVD** 更新。
+- **medoidshape**：先粗聚类，再贪心选出 `k` 条代表子序列；`sample_rate` 控制每轮采样的候选数量。
+
+**输出序列:** 由 `output` 控制：
+
+- **`output` = `label`（默认）：** 一条输出序列，类型为 **INT32**。行数 = 完整窗口个数 `⌊n/l⌋`。每行时间戳 = 
该窗口**第一个采样**的时间；值为簇编号 **0 … k−1**。
+- **`output` = `centroid`：** 一条输出序列，类型为 **DOUBLE**。行数 = **`k × l`**：按簇 **0 → 
k−1** 依次输出各簇质心的 `l` 个分量（拼接）。时间戳为 `0, 1, 2, …`（仅占位，无物理时间含义）。
+
+**提示:**
+
+- 需满足有效点数 `n ≥ l`，且窗口数 `⌊n/l⌋ ≥ k`。
+
+#### 使用示例
+
+##### KShape：窗口长度 3，k = 2
+
+九个采样 `{1,2,3,10,20,30,1,5,1}` 构成三个长度为 3 的不重叠窗口 
`{1,2,3}`、`{10,20,30}`、`{1,5,1}`。在 **`method` = `kshape`** 且默认 **`norm` = 
`true`** 时，每一行对应一个窗口的簇编号，时间戳为各窗口起点。得到的标签为：**0, 0, 1**。
+
+输入序列：
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+用于查询的 SQL 语句：
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+输出序列：
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```
diff --git a/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md 
b/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md
index f361f26c..df869257 100644
--- a/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md
+++ b/src/zh/UserGuide/V1.3.x/SQL-Manual/UDF-Libraries_timecho.md
@@ -5003,3 +5003,89 @@ select ar(s0,"p"="2") from root.test.d0
 |1970-01-01T08:00:00.002+08:00|                    -0.2571|
 +-----------------------------+---------------------------+
 ```
+
+### Cluster
+
+#### 注册语句
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### 函数简介
+
+本函数对**单条输入时间序列**，按固定长度 `l` 切分为**互不重叠**的连续子序列（窗口），再对这些子序列聚类，得到 `k` 个分组。
+
+**函数名:**  Cluster
+
+**输入序列:** 仅支持单条数值型时间序列，类型为 INT32 / INT64 / FLOAT / 
DOUBLE。点按时间顺序读取；末尾不足以凑满一整窗的采样会被**丢弃**（仅使用 `⌊n/l⌋` 个窗口，`n` 为有效点数）。
+
+**参数:**
+
+| 名称 | 含义 | 默认值 | 说明 |
+|------|------|--------|------|
+| `l` | 子序列（窗口）长度 | （必填） | 正整数；每个窗口含连续 `l` 个采样。 |
+| `k` | 聚类个数 | （必填） | 整数 ≥ 2。 |
+| `method` | 聚类算法 | `kmeans` | 
可选：`kmeans`、`kshape`、`medoidshape`（大小写不敏感）。省略时默认为 k-means。 |
+| `norm` | 是否对每个子序列做 Z-score 标准化 | `true` | 布尔；为 `true` 时在聚类前对每个子序列标准化。 |
+| `maxiter` | 最大迭代次数 | `200` | 正整数。 |
+| `output` | 输出模式 | `label` | `label`：每个窗口一个簇编号；`centroid`：按簇顺序拼接 `k` 个质心向量。 |
+| `sample_rate` | 贪心采样比例 | `0.3` | 仅在 **`method` = `medoidshape`** 时使用；取值须在 
`(0, 1]`。 |
+
+**`method` 说明:**
+
+- **kmeans**：欧氏空间中的 k-means（可选是否先做逐窗归一化）。
+- **kshape**：基于形状距离（由归一化互相关 NCC 得到的 SBD）分配簇；质心通过簇矩阵的 **SVD** 更新。
+- **medoidshape**：先粗聚类，再贪心选出 `k` 条代表子序列；`sample_rate` 控制每轮采样的候选数量。
+
+**输出序列:** 由 `output` 控制：
+
+- **`output` = `label`（默认）：** 一条输出序列，类型为 **INT32**。行数 = 完整窗口个数 `⌊n/l⌋`。每行时间戳 = 
该窗口**第一个采样**的时间；值为簇编号 **0 … k−1**。
+- **`output` = `centroid`：** 一条输出序列，类型为 **DOUBLE**。行数 = **`k × l`**：按簇 **0 → 
k−1** 依次输出各簇质心的 `l` 个分量（拼接）。时间戳为 `0, 1, 2, …`（仅占位，无物理时间含义）。
+
+**提示:**
+
+- 需满足有效点数 `n ≥ l`，且窗口数 `⌊n/l⌋ ≥ k`。
+
+#### 使用示例
+
+##### KShape：窗口长度 3，k = 2
+
+九个采样 `{1,2,3,10,20,30,1,5,1}` 构成三个长度为 3 的不重叠窗口 
`{1,2,3}`、`{10,20,30}`、`{1,5,1}`。在 **`method` = `kshape`** 且默认 **`norm` = 
`true`** 时，每一行对应一个窗口的簇编号，时间戳为各窗口起点。得到的标签为：**0, 0, 1**。
+
+输入序列：
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+用于查询的 SQL 语句：
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+输出序列：
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```
diff --git a/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md 
b/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md
index 1bd7addc..834a767a 100644
--- a/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md
+++ b/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_apache.md
@@ -4953,3 +4953,91 @@ select ar(s0,"p"="2") from root.test.d0
 |1970-01-01T08:00:00.002+08:00|                    -0.2571|
 +-----------------------------+---------------------------+
 ```
+
+### 9.2 Cluster
+
+#### 注册语句
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### 函数简介
+
+本函数对**单条输入时间序列**，按固定长度 `l` 切分为**互不重叠**的连续子序列（窗口），再对这些子序列聚类，得到 `k` 个分组。
+
+**函数名:**  Cluster
+
+**输入序列:** 仅支持单条数值型时间序列，类型为 INT32 / INT64 / FLOAT / 
DOUBLE。点按时间顺序读取；末尾不足以凑满一整窗的采样会被**丢弃**（仅使用 `⌊n/l⌋` 个窗口，`n` 为有效点数）。
+
+**参数:**
+
+| 名称 | 含义 | 默认值 | 说明 |
+|------|------|--------|------|
+| `l` | 子序列（窗口）长度 | （必填） | 正整数；每个窗口含连续 `l` 个采样。 |
+| `k` | 聚类个数 | （必填） | 整数 ≥ 2。 |
+| `method` | 聚类算法 | `kmeans` | 
可选：`kmeans`、`kshape`、`medoidshape`（大小写不敏感）。省略时默认为 k-means。 |
+| `norm` | 是否对每个子序列做 Z-score 标准化 | `true` | 布尔；为 `true` 时在聚类前对每个子序列标准化。 |
+| `maxiter` | 最大迭代次数 | `200` | 正整数。 |
+| `output` | 输出模式 | `label` | `label`：每个窗口一个簇编号；`centroid`：按簇顺序拼接 `k` 个质心向量。 |
+| `sample_rate` | 贪心采样比例 | `0.3` | 仅在 **`method` = `medoidshape`** 时使用；取值须在 
`(0, 1]`。 |
+
+**`method` 说明:**
+
+- **kmeans**：欧氏空间中的 k-means（可选是否先做逐窗归一化）。
+- **kshape**：基于形状距离（由归一化互相关 NCC 得到的 SBD）分配簇；质心通过簇矩阵的 **SVD** 更新。
+- **medoidshape**：先粗聚类，再贪心选出 `k` 条代表子序列；`sample_rate` 控制每轮采样的候选数量。
+
+**输出序列:** 由 `output` 控制：
+
+- **`output` = `label`（默认）：** 一条输出序列，类型为 **INT32**。行数 = 完整窗口个数 `⌊n/l⌋`。每行时间戳 = 
该窗口**第一个采样**的时间；值为簇编号 **0 … k−1**。
+- **`output` = `centroid`：** 一条输出序列，类型为 **DOUBLE**。行数 = **`k × l`**：按簇 **0 → 
k−1** 依次输出各簇质心的 `l` 个分量（拼接）。时间戳为 `0, 1, 2, …`（仅占位，无物理时间含义）。
+
+**提示:**
+
+- 需满足有效点数 `n ≥ l`，且窗口数 `⌊n/l⌋ ≥ k`。
+
+#### 使用示例
+
+##### KShape：窗口长度 3，k = 2
+
+九个采样 `{1,2,3,10,20,30,1,5,1}` 构成三个长度为 3 的不重叠窗口 
`{1,2,3}`、`{10,20,30}`、`{1,5,1}`。在 **`method` = `kshape`** 且默认 **`norm` = 
`true`** 时，每一行对应一个窗口的簇编号，时间戳为各窗口起点。得到的标签为：**0, 0, 1**。
+
+输入序列：
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+用于查询的 SQL 语句：
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+输出序列：
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```
+
+
diff --git a/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md 
b/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md
index b26f6b66..2ef07e3b 100644
--- a/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md
+++ b/src/zh/UserGuide/latest/SQL-Manual/UDF-Libraries_timecho.md
@@ -5005,3 +5005,89 @@ select ar(s0,"p"="2") from root.test.d0
 |1970-01-01T08:00:00.002+08:00|                    -0.2571|
 +-----------------------------+---------------------------+
 ```
+
+### 9.2 Cluster
+
+#### 注册语句
+
+```sql
+create function cluster as 'org.apache.iotdb.library.dlearn.UDTFCluster'
+```
+
+#### 函数简介
+
+本函数对**单条输入时间序列**，按固定长度 `l` 切分为**互不重叠**的连续子序列（窗口），再对这些子序列聚类，得到 `k` 个分组。
+
+**函数名:**  Cluster
+
+**输入序列:** 仅支持单条数值型时间序列，类型为 INT32 / INT64 / FLOAT / 
DOUBLE。点按时间顺序读取；末尾不足以凑满一整窗的采样会被**丢弃**（仅使用 `⌊n/l⌋` 个窗口，`n` 为有效点数）。
+
+**参数:**
+
+| 名称 | 含义 | 默认值 | 说明 |
+|------|------|--------|------|
+| `l` | 子序列（窗口）长度 | （必填） | 正整数；每个窗口含连续 `l` 个采样。 |
+| `k` | 聚类个数 | （必填） | 整数 ≥ 2。 |
+| `method` | 聚类算法 | `kmeans` | 
可选：`kmeans`、`kshape`、`medoidshape`（大小写不敏感）。省略时默认为 k-means。 |
+| `norm` | 是否对每个子序列做 Z-score 标准化 | `true` | 布尔；为 `true` 时在聚类前对每个子序列标准化。 |
+| `maxiter` | 最大迭代次数 | `200` | 正整数。 |
+| `output` | 输出模式 | `label` | `label`：每个窗口一个簇编号；`centroid`：按簇顺序拼接 `k` 个质心向量。 |
+| `sample_rate` | 贪心采样比例 | `0.3` | 仅在 **`method` = `medoidshape`** 时使用；取值须在 
`(0, 1]`。 |
+
+**`method` 说明:**
+
+- **kmeans**：欧氏空间中的 k-means（可选是否先做逐窗归一化）。
+- **kshape**：基于形状距离（由归一化互相关 NCC 得到的 SBD）分配簇；质心通过簇矩阵的 **SVD** 更新。
+- **medoidshape**：先粗聚类，再贪心选出 `k` 条代表子序列；`sample_rate` 控制每轮采样的候选数量。
+
+**输出序列:** 由 `output` 控制：
+
+- **`output` = `label`（默认）：** 一条输出序列，类型为 **INT32**。行数 = 完整窗口个数 `⌊n/l⌋`。每行时间戳 = 
该窗口**第一个采样**的时间；值为簇编号 **0 … k−1**。
+- **`output` = `centroid`：** 一条输出序列，类型为 **DOUBLE**。行数 = **`k × l`**：按簇 **0 → 
k−1** 依次输出各簇质心的 `l` 个分量（拼接）。时间戳为 `0, 1, 2, …`（仅占位，无物理时间含义）。
+
+**提示:**
+
+- 需满足有效点数 `n ≥ l`，且窗口数 `⌊n/l⌋ ≥ k`。
+
+#### 使用示例
+
+##### KShape：窗口长度 3，k = 2
+
+九个采样 `{1,2,3,10,20,30,1,5,1}` 构成三个长度为 3 的不重叠窗口 
`{1,2,3}`、`{10,20,30}`、`{1,5,1}`。在 **`method` = `kshape`** 且默认 **`norm` = 
`true`** 时，每一行对应一个窗口的簇编号，时间戳为各窗口起点。得到的标签为：**0, 0, 1**。
+
+输入序列：
+
+```
++-----------------------------+---------------+
+|                         Time|root.test.d0.s0|
++-----------------------------+---------------+
+|2020-01-01T00:00:01.000+08:00|            1.0|
+|2020-01-01T00:00:02.000+08:00|            2.0|
+|2020-01-01T00:00:03.000+08:00|            3.0|
+|2020-01-01T00:00:04.000+08:00|           10.0|
+|2020-01-01T00:00:05.000+08:00|           20.0|
+|2020-01-01T00:00:06.000+08:00|           30.0|
+|2020-01-01T00:00:07.000+08:00|            1.0|
+|2020-01-01T00:00:08.000+08:00|            5.0|
+|2020-01-01T00:00:09.000+08:00|            1.0|
++-----------------------------+---------------+
+```
+
+用于查询的 SQL 语句：
+
+```sql
+select cluster(s0, "l"="3", "k"="2", "method"="kshape", "output"="label")
+from root.test.d0
+```
+
+输出序列：
+
+```
++-----------------------------+----------------------------------------------------------------------------+
+|                         
Time|cluster(root.test.d0.s0,"l"="3","k"="2","method"="kshape","output"="label")|
++-----------------------------+----------------------------------------------------------------------------+
+|2020-01-01T00:00:01.000+08:00|                                                
                           0|
+|2020-01-01T00:00:04.000+08:00|                                                
                           0|
+|2020-01-01T00:00:07.000+08:00|                                                
                           1|
++-----------------------------+----------------------------------------------------------------------------+
+```

(iotdb-docs) branch main updated: Update Cluster UDF in UDF-Libraries (#1072)

Reply via email to