[DISCUSS] PIP-36: Introduce Incremental Clustering for Paimon Append Table

lei li Fri, 19 Sep 2025 02:39:49 -0700

Hi everyone,

I'd like to start a discussion about PIP-36: Introduce Incremental Clustering
for Paimon Append Table [1].

Paimon currently supports ordering append tables using SFC (Space-Filling
Curve)[2]. The resulting data layout typically delivers better performance for
queries that target clustering keys. However, with the current SortCompact,
even when neither the data nor the clustering keys have changed, each run still
rewrites the entire dataset, which is extremely costly. To address this, we
plan to introduce a more flexible, incremental clustering mechanism—Incremental
Clustering. On each run, it selects only a specific subset of files to cluster,
avoiding a full rewrite. This enables low-cost, sort-based optimization of the
data layout and improves query performance. In addition, with Incremental
Clustering, you can adjust clustering keys without rewriting existing data, the
layout evolves dynamically as cluster runs and gradually converges to an
optimal state, significantly reducing the decision-making complexity around
data layout.

Incremental Clustering supports:

* Support incremental clustering; minimizing write amplification as
possible.
* Support small-file compaction; during rewrites, respect target-file-size.
* Support changing clustering keys; newly ingested data is clustered
according to the latest clustering keys.
* Provide a full mode; when selected, the entire dataset is reclustered.

The detailed design and PoC results can be see in PIP-36[1].

Looking forward to your feedback, thanks!

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-36%3A+Introduce+Incremental+Clustering+for+Paimon+Append+Table[2]

https://paimon.apache.org/docs/master/maintenance/dedicated-compaction/#sort-compact
Best,

Lei Li

[DISCUSS] PIP-36: Introduce Incremental Clustering for Paimon Append Table

Reply via email to