JunRuiLee opened a new issue, #7806: URL: https://github.com/apache/paimon/issues/7806
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation We run a dataset management platform on top of Paimon, preparing training data for LLM workloads. The platform manages **shared datasets** — multiple teams and pipelines (data cleaning, annotation, deduplication, feature extraction, etc.) read from and write to the same primary-key tables concurrently as part of their daily workflows. Each writer assigns sequence numbers from its own independent counter, so **sequence numbers across different writers are simply incomparable** — they carry no cross-writer temporal meaning. The only reliable ordering signal across writers is the commit order itself (i.e., snapshot id). Today Paimon resolves primary-key conflicts solely by sequence number (or `sequence.field`), which cannot express this cross-writer ordering. What we need is: **records committed in a later snapshot always win**, regardless of in-row sequence numbers. The sequence number serves as a secondary tiebreaker within the same snapshot. ### Solution Add a table option `sequence.snapshot-ordering`. When enabled, merge uses the **commit snapshot id** as the primary tiebreaker for primary-key conflicts — records from later snapshots always win, with sequence number as the secondary tiebreaker within the same snapshot. This follows the same file-level stamping pattern as `firstRowId` in row-tracking tables: a nullable `commitSnapshotId` field on `DataFileMeta`, assigned at commit time in `FileStoreCommitImpl`, with no changes to the write path or data file format. ### Design Details 1. **File-level stamp at commit time.** Add a nullable commit-snapshot-id field on data file metadata. At commit, stamp newly added files with the current snapshot id — same injection point as row-id assignment. 2. **Propagate through compaction.** Compaction output inherits the max snapshot id of its inputs, so dedicated-compaction jobs preserve correct ordering. 3. **Inflate to record at read time.** When reading a data file, the file-level stamp is carried onto every key-value record produced from that file, making the snapshot id available to the merge layer without changing the data file format. 4. **Conditional comparator injection.** The sort-merge reader uses the record-level snapshot id as primary tiebreaker only when the option is enabled — zero overhead for other tables. ### Implementation Plan Will be split into 3 PRs: 1. **Infrastructure:** Add `_COMMIT_SNAPSHOT_ID` nullable field to `DataFileMeta` and `KeyValue`, bump `DataSplit` / `CommitMessageSerializer` versions with legacy serializer. Pure plumbing, no behavior change. 2. **Write path:** Add `sequence.snapshot-ordering` option with validation, stamp files at commit time in `FileStoreCommitImpl`, propagate max snapshot id through compaction. 3. **Read path:** Inject snapshot-id tiebreaker into sort-merge reader comparators (conditional on option), add unit and integration tests. ### Anything else? _No response_ ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
