This is an automated email from the ASF dual-hosted git repository.

JingsongLi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git


The following commit(s) were added to refs/heads/master by this push:
     new 34050f0b82 [docs] Add clustering.incremental.mode documentation for 
incremental clustering (#8161)
34050f0b82 is described below

commit 34050f0b8200b531089001473c6bcc2baefedeb5
Author: sanshi <[email protected]>
AuthorDate: Mon Jun 8 17:47:15 2026 +0800

    [docs] Add clustering.incremental.mode documentation for incremental 
clustering (#8161)
---
 docs/docs/append-table/incremental-clustering.mdx | 35 ++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/docs/docs/append-table/incremental-clustering.mdx 
b/docs/docs/append-table/incremental-clustering.mdx
index b8a2108d40..54c146c4ba 100644
--- a/docs/docs/append-table/incremental-clustering.mdx
+++ b/docs/docs/append-table/incremental-clustering.mdx
@@ -82,6 +82,13 @@ To enable Incremental Clustering, the following 
configuration needs to be set fo
       <td>String</td>
       <td>The ordering algorithm used for clustering. If not set, It'll 
decided from the number of clustering columns. 'order' is used for 1 column, 
'zorder' for less than 5 columns, and 'hilbert' for 5 or more columns.</td>
     </tr>
+    <tr>
+      <td><h5>clustering.incremental.mode</h5></td>
+      <td>'global-sort' or 'local-sort'</td>
+      <td style={{wordWrap: "break-word"}}>No</td>
+      <td>Enum</td>
+      <td>The sort mode for incremental clustering compaction. Default is 
<code>global-sort</code>. <code>global-sort</code> performs a global range 
shuffle across tasks before local sorting, output files are globally ordered by 
the clustering columns at the cost of network shuffling. 
<code>local-sort</code> skips the global shuffle and sorts rows only within 
each compaction task independently, each output file is internally ordered but 
there is no global ordering across files, this mode [...]
+    </tr>
     </tbody>
 
 </table>
@@ -120,6 +127,14 @@ CALL sys.compact(table => 'T')
 
 -- run incremental clustering with full mode, this will recluster all data
 CALL sys.compact(table => 'T', compact_strategy => 'full')
+
+-- run incremental clustering with global-sort mode (default)
+-- performs a global range shuffle across tasks, output files are globally 
ordered
+CALL sys.compact(table => 'T', options => 
'clustering.incremental.mode=global-sort')
+
+-- run incremental clustering with local-sort mode
+-- sorts rows only within each task, no global shuffle, cheaper and sufficient 
for Parquet lookup optimizations
+CALL sys.compact(table => 'T', options => 
'clustering.incremental.mode=local-sort')
 ```
 
 </TabItem>
@@ -140,7 +155,24 @@ Run the following command to submit a incremental 
clustering job for the table.
     [--catalog_conf <paimon-catalog-conf> [--catalog_conf 
<paimon-catalog-conf> ...]]
 ```
 
-Example: run incremental clustering
+Example: run incremental clustering with global-sort mode (default), output 
files are globally ordered across all tasks.
+
+```bash
+<FLINK_HOME>/bin/flink run \
+    /path/to/paimon-flink-action-@@VERSION@@.jar \
+    compact \
+    --warehouse s3:///path/to/warehouse \
+    --database test_db \
+    --table test_table \
+    --table_conf sink.parallelism=2 \
+    --table_conf clustering.incremental.mode=global-sort \
+    --compact_strategy minor \
+    --catalog_conf s3.endpoint=https://****.com \
+    --catalog_conf s3.access-key=***** \
+    --catalog_conf s3.secret-key=*****
+```
+
+Example: run incremental clustering with local-sort mode, sorts rows only 
within each task without global shuffle, cheaper and sufficient for Parquet 
lookup optimizations.
 
 ```bash
 <FLINK_HOME>/bin/flink run \
@@ -150,6 +182,7 @@ Example: run incremental clustering
     --database test_db \
     --table test_table \
     --table_conf sink.parallelism=2 \
+    --table_conf clustering.incremental.mode=local-sort \
     --compact_strategy minor \
     --catalog_conf s3.endpoint=https://****.com \
     --catalog_conf s3.access-key=***** \

Reply via email to