wzhero1 commented on code in PR #7027:
URL: https://github.com/apache/paimon/pull/7027#discussion_r2922827667
##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/action/ExpireSnapshotsAction.java:
##########
@@ -50,12 +95,168 @@ public ExpireSnapshotsAction(
this.olderThan = olderThan;
this.maxDeletes = maxDeletes;
this.options = options;
+ this.parallelism = parallelism;
}
+ /** Returns true if forceStartFlinkJob is enabled and parallelism is
greater than 1. */
+ private boolean isParallelMode() {
+ return forceStartFlinkJob && parallelism != null && parallelism > 1;
Review Comment:
Thanks for the suggestions! I've made --parallelism optional — when not
specified, it falls back to env.getParallelism(), so users can control it via
Flink configuration naturally. (Automatic parallelism inference based on
workload is out of scope for this PR.)
Regarding the dynamic task fetching approach (env.fromSequence(startId,
endId).flatMap(...)), the current design intentionally separates deletion into
two phases:
- Worker phase (parallel): deletes data files and changelog files
- Sink phase (serial): deletes manifests and snapshot metadata files
This separation is necessary because:
1. manifestSkippingSet is global and mutable: Built from all tags +
endSnapshot manifests. Parallel manifest deletion would require distributed
coordination to safely update this set — significantly more complex.
2. Serial deletion ensures consistency: Parallel snapshot deletion could
leave gaps (e.g., snapshots 1, 3, 5 exist but 2, 4 are deleted), which is
confusing for users. Serial execution in the sink ensures clean, contiguous
deletion.
3. Range partitioning maximizes cache locality: createDataFileSkipperForTags
caches tag data files internally. Adjacent snapshots typically share the same
nearest tag, so processing contiguous ranges avoids expensive cache rebuilds.
Dynamic task fetching (each subtask fetching the next available split) would
scatter snapshots across subtasks and break this cache benefit.
Data file deletion is the most time-consuming part and is safely
parallelizable. Manifest/snapshot deletion is lightweight but requires ordering
guarantees, so serial execution in the sink is the right trade-off.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]