This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 5e91e9f7b2 [docs] add docs for auto-clustering of historical
partitions (#6516)
5e91e9f7b2 is described below
commit 5e91e9f7b23c2e19baa0dc99881e4598f0b38f29
Author: LsomeYeah <[email protected]>
AuthorDate: Mon Nov 3 16:02:22 2025 +0800
[docs] add docs for auto-clustering of historical partitions (#6516)
---
.../content/append-table/incremental-clustering.md | 37 ++++++++++++++++++++++
.../shortcodes/generated/core_configuration.html | 2 +-
.../main/java/org/apache/paimon/CoreOptions.java | 3 +-
3 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/docs/content/append-table/incremental-clustering.md
b/docs/content/append-table/incremental-clustering.md
index aa72a348fc..0ca3462e16 100644
--- a/docs/content/append-table/incremental-clustering.md
+++ b/docs/content/append-table/incremental-clustering.md
@@ -164,6 +164,43 @@ You can use `-D execution.runtime-mode=batch` or `-yD
execution.runtime-mode=bat
{{< /tabs >}}
+## Auto-Clustering For Historical Partition
+While performing incremental clustering on recently active partitions, Paimon
can automatically detect historical and
+inactive partitions and evaluate whether their data layout has reached an
optimal state.
+For those historical partitions that have not yet achieved optimal layout,
Paimon will also perform full clustering on them
+during the same operation, thereby improving their query performance.
+
+To enable auto-clustering for historical partitions, the following
configuration needs to be set for the table:
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 20%">Option</th>
+ <th class="text-left" style="width: 10%">Value</th>
+ <th class="text-left" style="width: 5%">Required</th>
+ <th class="text-left" style="width: 10%">Type</th>
+ <th class="text-left" style="width: 55%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><h5>clustering.history-partition.idle-to-full-sort</h5></td>
+ <td>3d</td>
+ <td style="word-wrap: break-word;">Yes</td>
+ <td>Duration</td>
+ <td>The duration after which a partition without new updates is
considered a historical partition. Default is null.</td>
+ </tr>
+ <tr>
+ <td><h5>clustering.history-partition.limit</h5></td>
+ <td>5</td>
+ <td style="word-wrap: break-word;">Yes</td>
+ <td>Integer</td>
+ <td>The limit of history partition number for automatically performing
full clustering. Default value is 5.</td>
+ </tr>
+ </tbody>
+
+</table>
+
+
## Implement
To balance write amplification and sorting effectiveness, Paimon leverages the
LSM Tree notion of levels to stratify data files
and uses the Universal Compaction strategy to select files for clustering.
diff --git a/docs/layouts/shortcodes/generated/core_configuration.html
b/docs/layouts/shortcodes/generated/core_configuration.html
index d26fa9b59d..7f7bb6816d 100644
--- a/docs/layouts/shortcodes/generated/core_configuration.html
+++ b/docs/layouts/shortcodes/generated/core_configuration.html
@@ -168,7 +168,7 @@ under the License.
<td><h5>clustering.history-partition.idle-to-full-sort</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>Duration</td>
- <td>The duration after which a partition without new updates is
considered a historical partition. Historical partitions will be automatically
fully clustered during the cluster operation.This option takes effects when
'clustering.history-partition.auto.enabled' is true.</td>
+ <td>The duration after which a partition without new updates is
considered a historical partition. Historical partitions will be automatically
fully clustered during the cluster operation.</td>
</tr>
<tr>
<td><h5>clustering.history-partition.limit</h5></td>
diff --git a/paimon-api/src/main/java/org/apache/paimon/CoreOptions.java
b/paimon-api/src/main/java/org/apache/paimon/CoreOptions.java
index f5c90b4196..8680017027 100644
--- a/paimon-api/src/main/java/org/apache/paimon/CoreOptions.java
+++ b/paimon-api/src/main/java/org/apache/paimon/CoreOptions.java
@@ -1970,8 +1970,7 @@ public class CoreOptions implements Serializable {
.noDefaultValue()
.withDescription(
"The duration after which a partition without new
updates is considered a historical partition. "
- + "Historical partitions will be
automatically fully clustered during the cluster operation."
- + "This option takes effects when
'clustering.history-partition.auto.enabled' is true.");
+ + "Historical partitions will be
automatically fully clustered during the cluster operation.");
public static final ConfigOption<Boolean> ROW_TRACKING_ENABLED =
key("row-tracking.enabled")