leaves12138 commented on code in PR #7832:
URL: https://github.com/apache/paimon/pull/7832#discussion_r3259283483
##########
paimon-core/src/main/java/org/apache/paimon/operation/FileStoreCommitImpl.java:
##########
@@ -969,6 +969,10 @@ CommitResult tryCommitOnce(
deltaFiles = assigned.assignedEntries;
}
+ if (options.snapshotSequenceOrdering()) {
+ deltaFiles = stampSequenceWithSnapshotId(newSnapshotId,
commitKind, deltaFiles);
Review Comment:
I think this still has a correctness issue around where the snapshot-id
stamp is applied.
`stampSequenceWithSnapshotId` only rewrites the `ManifestEntry`s that are
written to the snapshot. The writer-side `DataFileMeta`s kept in
`MergeTreeWriter` / `Levels` are still the original file metas produced by the
rolling writer, whose min/max sequence numbers are local writer sequence
numbers. If the same writer later compacts those already-committed APPEND files
without being restored from the committed manifest, `KeyValueFileReaderFactory`
will recover `snapshotId` from `file.minSequenceNumber()` for APPEND files and
may treat that local sequence as the snapshot id.
A failing shape would be:
1. write enough records so the local sequence number becomes much larger
than the snapshot id;
2. commit an old version of key K (the manifest is stamped correctly, but
the writer-side `Levels` still has the local sequence in `DataFileMeta`);
3. trigger compaction in the same writer, which reads the APPEND file and
writes that local sequence into the compacted row's per-record
sequence/snapshot id;
4. write a later snapshot for key K. The older compacted row can now compare
as newer and win.
Could we either update the writer-side file metas after commit (or otherwise
avoid using unstamped in-memory metas for snapshot-ordering compaction), and
add a regression test where local sequence numbers are intentionally larger
than subsequent snapshot ids before compaction?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]