leaves12138 commented on code in PR #7832:
URL: https://github.com/apache/paimon/pull/7832#discussion_r3259283483


##########
paimon-core/src/main/java/org/apache/paimon/operation/FileStoreCommitImpl.java:
##########
@@ -969,6 +969,10 @@ CommitResult tryCommitOnce(
                 deltaFiles = assigned.assignedEntries;
             }
 
+            if (options.snapshotSequenceOrdering()) {
+                deltaFiles = stampSequenceWithSnapshotId(newSnapshotId, 
commitKind, deltaFiles);

Review Comment:
   I think this still has a correctness issue around where the snapshot-id 
stamp is applied.
   
   `stampSequenceWithSnapshotId` only rewrites the `ManifestEntry`s that are 
written to the snapshot. The writer-side `DataFileMeta`s kept in 
`MergeTreeWriter` / `Levels` are still the original file metas produced by the 
rolling writer, whose min/max sequence numbers are local writer sequence 
numbers. If the same writer later compacts those already-committed APPEND files 
without being restored from the committed manifest, `KeyValueFileReaderFactory` 
will recover `snapshotId` from `file.minSequenceNumber()` for APPEND files and 
may treat that local sequence as the snapshot id.
   
   A failing shape would be:
   1. write enough records so the local sequence number becomes much larger 
than the snapshot id;
   2. commit an old version of key K (the manifest is stamped correctly, but 
the writer-side `Levels` still has the local sequence in `DataFileMeta`);
   3. trigger compaction in the same writer, which reads the APPEND file and 
writes that local sequence into the compacted row's per-record 
sequence/snapshot id;
   4. write a later snapshot for key K. The older compacted row can now compare 
as newer and win.
   
   Could we either update the writer-side file metas after commit (or otherwise 
avoid using unstamped in-memory metas for snapshot-ordering compaction), and 
add a regression test where local sequence numbers are intentionally larger 
than subsequent snapshot ids before compaction?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to