leaves12138 commented on code in PR #7799:
URL: https://github.com/apache/paimon/pull/7799#discussion_r3216026309


##########
paimon-core/src/main/java/org/apache/paimon/operation/commit/StrictModeChecker.java:
##########
@@ -92,6 +101,29 @@ public void check(long newSnapshotId, CommitKind 
newCommitKind) {
         }
     }
 
+    private boolean hasOverlappedPartition(Snapshot snapshot, Set<BinaryRow> 
newPartitions) {
+        if (newPartitions.isEmpty()) {
+            return false;
+        }
+        Iterator<ManifestEntry> entries =
+                
scan.withSnapshot(snapshot).withKind(ScanMode.DELTA).dropStats().readFileIterator();

Review Comment:
   `FileStoreScan` is mutable: `onlyReadRealBuckets()` sets state on both 
`AbstractFileStoreScan` and its `ManifestsReader`, and there is no reset in 
this checker. During an OVERWRITE commit, once the loop sees an earlier APPEND 
snapshot, the APPEND branch calls 
`scan.withSnapshot(...).withKind(DELTA).onlyReadRealBuckets()...`. The same 
`scan` instance is then reused here for later COMPACT/OVERWRITE snapshots, so 
those scans also inherit `onlyReadRealBuckets=true`.
   
   That can make strict mode miss an overlapping later COMPACT/OVERWRITE whose 
files are not real-bucket files (for example bucket-unaware / postpone-bucket 
entries). A sequence like APPEND snapshot -> OVERWRITE/COMPACT snapshot on the 
same target partition -> new OVERWRITE can pass when it should fail.
   
   Please avoid sharing this mutated scan state between checks, e.g. create a 
fresh `FileStoreScan` for each snapshot/check (pass a supplier into 
`StrictModeChecker`) or otherwise reset the bucket filter before 
`hasOverlappedPartition(snapshot, ...)`. It would also be good to add a 
regression test where an APPEND snapshot is checked before a later overlapping 
COMPACT/OVERWRITE snapshot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to