This is an automated email from the ASF dual-hosted git repository.
JingsongLi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 34050f0b82 [docs] Add clustering.incremental.mode documentation for
incremental clustering (#8161)
34050f0b82 is described below
commit 34050f0b8200b531089001473c6bcc2baefedeb5
Author: sanshi <[email protected]>
AuthorDate: Mon Jun 8 17:47:15 2026 +0800
[docs] Add clustering.incremental.mode documentation for incremental
clustering (#8161)
---
docs/docs/append-table/incremental-clustering.mdx | 35 ++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/docs/docs/append-table/incremental-clustering.mdx
b/docs/docs/append-table/incremental-clustering.mdx
index b8a2108d40..54c146c4ba 100644
--- a/docs/docs/append-table/incremental-clustering.mdx
+++ b/docs/docs/append-table/incremental-clustering.mdx
@@ -82,6 +82,13 @@ To enable Incremental Clustering, the following
configuration needs to be set fo
<td>String</td>
<td>The ordering algorithm used for clustering. If not set, It'll
decided from the number of clustering columns. 'order' is used for 1 column,
'zorder' for less than 5 columns, and 'hilbert' for 5 or more columns.</td>
</tr>
+ <tr>
+ <td><h5>clustering.incremental.mode</h5></td>
+ <td>'global-sort' or 'local-sort'</td>
+ <td style={{wordWrap: "break-word"}}>No</td>
+ <td>Enum</td>
+ <td>The sort mode for incremental clustering compaction. Default is
<code>global-sort</code>. <code>global-sort</code> performs a global range
shuffle across tasks before local sorting, output files are globally ordered by
the clustering columns at the cost of network shuffling.
<code>local-sort</code> skips the global shuffle and sorts rows only within
each compaction task independently, each output file is internally ordered but
there is no global ordering across files, this mode [...]
+ </tr>
</tbody>
</table>
@@ -120,6 +127,14 @@ CALL sys.compact(table => 'T')
-- run incremental clustering with full mode, this will recluster all data
CALL sys.compact(table => 'T', compact_strategy => 'full')
+
+-- run incremental clustering with global-sort mode (default)
+-- performs a global range shuffle across tasks, output files are globally
ordered
+CALL sys.compact(table => 'T', options =>
'clustering.incremental.mode=global-sort')
+
+-- run incremental clustering with local-sort mode
+-- sorts rows only within each task, no global shuffle, cheaper and sufficient
for Parquet lookup optimizations
+CALL sys.compact(table => 'T', options =>
'clustering.incremental.mode=local-sort')
```
</TabItem>
@@ -140,7 +155,24 @@ Run the following command to submit a incremental
clustering job for the table.
[--catalog_conf <paimon-catalog-conf> [--catalog_conf
<paimon-catalog-conf> ...]]
```
-Example: run incremental clustering
+Example: run incremental clustering with global-sort mode (default), output
files are globally ordered across all tasks.
+
+```bash
+<FLINK_HOME>/bin/flink run \
+ /path/to/paimon-flink-action-@@VERSION@@.jar \
+ compact \
+ --warehouse s3:///path/to/warehouse \
+ --database test_db \
+ --table test_table \
+ --table_conf sink.parallelism=2 \
+ --table_conf clustering.incremental.mode=global-sort \
+ --compact_strategy minor \
+ --catalog_conf s3.endpoint=https://****.com \
+ --catalog_conf s3.access-key=***** \
+ --catalog_conf s3.secret-key=*****
+```
+
+Example: run incremental clustering with local-sort mode, sorts rows only
within each task without global shuffle, cheaper and sufficient for Parquet
lookup optimizations.
```bash
<FLINK_HOME>/bin/flink run \
@@ -150,6 +182,7 @@ Example: run incremental clustering
--database test_db \
--table test_table \
--table_conf sink.parallelism=2 \
+ --table_conf clustering.incremental.mode=local-sort \
--compact_strategy minor \
--catalog_conf s3.endpoint=https://****.com \
--catalog_conf s3.access-key=***** \