leaves12138 commented on code in PR #7799:
URL: https://github.com/apache/paimon/pull/7799#discussion_r3216026309
##########
paimon-core/src/main/java/org/apache/paimon/operation/commit/StrictModeChecker.java:
##########
@@ -92,6 +101,29 @@ public void check(long newSnapshotId, CommitKind
newCommitKind) {
}
}
+ private boolean hasOverlappedPartition(Snapshot snapshot, Set<BinaryRow>
newPartitions) {
+ if (newPartitions.isEmpty()) {
+ return false;
+ }
+ Iterator<ManifestEntry> entries =
+
scan.withSnapshot(snapshot).withKind(ScanMode.DELTA).dropStats().readFileIterator();
Review Comment:
`FileStoreScan` is mutable: `onlyReadRealBuckets()` sets state on both
`AbstractFileStoreScan` and its `ManifestsReader`, and there is no reset in
this checker. During an OVERWRITE commit, once the loop sees an earlier APPEND
snapshot, the APPEND branch calls
`scan.withSnapshot(...).withKind(DELTA).onlyReadRealBuckets()...`. The same
`scan` instance is then reused here for later COMPACT/OVERWRITE snapshots, so
those scans also inherit `onlyReadRealBuckets=true`.
That can make strict mode miss an overlapping later COMPACT/OVERWRITE whose
files are not real-bucket files (for example bucket-unaware / postpone-bucket
entries). A sequence like APPEND snapshot -> OVERWRITE/COMPACT snapshot on the
same target partition -> new OVERWRITE can pass when it should fail.
Please avoid sharing this mutated scan state between checks, e.g. create a
fresh `FileStoreScan` for each snapshot/check (pass a supplier into
`StrictModeChecker`) or otherwise reset the bucket filter before
`hasOverlappedPartition(snapshot, ...)`. It would also be good to add a
regression test where an APPEND snapshot is checked before a later overlapping
COMPACT/OVERWRITE snapshot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]