Re: [PR] [core][flink] Support parallelism snapshot expire [paimon]

via GitHub Thu, 12 Mar 2026 00:38:52 -0700


wzhero1 commented on code in PR #7027:
URL: https://github.com/apache/paimon/pull/7027#discussion_r2922827667



##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/action/ExpireSnapshotsAction.java:
##########
@@ -50,12 +95,168 @@ public ExpireSnapshotsAction(
         this.olderThan = olderThan;
         this.maxDeletes = maxDeletes;
         this.options = options;
+        this.parallelism = parallelism;
     }
 
+    /** Returns true if forceStartFlinkJob is enabled and parallelism is 
greater than 1. */
+    private boolean isParallelMode() {
+        return forceStartFlinkJob && parallelism != null && parallelism > 1;

Review Comment:
     Thanks for the suggestions! I've made --parallelism optional — when not 
specified, it falls back to env.getParallelism(), so users can control it via 
Flink configuration naturally. (Automatic parallelism inference based on 
workload is out of scope for this PR.)
   
    Regarding the dynamic task fetching approach (env.fromSequence(startId, 
endId).flatMap(...)), the current design intentionally separates deletion into 
two phases:
   
     - Worker phase (parallel): deletes data files and changelog files
     - Sink phase (serial): deletes manifests and snapshot metadata files
   
     This separation is necessary because:
   
   1. manifestSkippingSet is global and mutable: Built from all tags + 
endSnapshot manifests. Parallel manifest deletion would require distributed 
coordination to safely update this set — significantly more complex.
   2. Serial deletion ensures consistency: Parallel snapshot deletion could 
leave gaps (e.g., snapshots 1, 3, 5 exist but 2, 4 are deleted), which is 
confusing for users. Serial execution in the sink ensures clean, contiguous
     deletion.
   3. Range partitioning maximizes cache locality: createDataFileSkipperForTags 
caches tag data files internally. Adjacent snapshots typically share the same 
nearest tag, so processing contiguous ranges avoids expensive cache rebuilds. 
Dynamic task fetching (each subtask fetching the next available split) would 
scatter snapshots across subtasks and break this cache benefit.
   Data file deletion is the most time-consuming part and is safely 
parallelizable. Manifest/snapshot deletion is lightweight but requires ordering 
guarantees, so serial execution in the sink is the right trade-off.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core][flink] Support parallelism snapshot expire [paimon]

Reply via email to