[I] [Feature] Support snapshot-based sequence ordering for primary-key tables [paimon]

via GitHub Sun, 10 May 2026 09:08:44 -0700


JunRuiLee opened a new issue, #7806:
URL: https://github.com/apache/paimon/issues/7806


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   We run a dataset management platform on top of Paimon, preparing training 
data for LLM workloads. The platform manages **shared datasets** — multiple 
teams and pipelines (data cleaning, annotation, deduplication, feature 
extraction, etc.) read from and write to the same primary-key tables 
concurrently as part of their daily workflows.
   
     Each writer assigns sequence numbers from its own independent counter, so 
**sequence numbers across different writers are simply incomparable** — they 
carry no cross-writer temporal meaning. The only reliable ordering signal 
across writers is
     the commit order itself (i.e., snapshot id). Today Paimon resolves 
primary-key conflicts solely by sequence number (or `sequence.field`), which 
cannot express this cross-writer ordering.
   
     What we need is: **records committed in a later snapshot always win**, 
regardless of in-row sequence numbers. The sequence number serves as a 
secondary tiebreaker within the same snapshot.
   
   
   ### Solution
   
   Add a table option `sequence.snapshot-ordering`. When enabled, merge uses 
the **commit snapshot id** as the primary tiebreaker for primary-key conflicts 
— records from later snapshots always win, with sequence number as the 
secondary tiebreaker within the same snapshot. This follows the same file-level 
stamping pattern as `firstRowId` in row-tracking tables: a nullable 
`commitSnapshotId` field on `DataFileMeta`, assigned at commit time in 
`FileStoreCommitImpl`, with no changes to the write path or data file format.
   
     ### Design Details
   
    1. **File-level stamp at commit time.** Add a nullable commit-snapshot-id 
field on data file metadata. At commit, stamp newly added files with the 
current snapshot id — same injection point as row-id assignment.
   
     2. **Propagate through compaction.** Compaction output inherits the max 
snapshot id of its inputs, so dedicated-compaction jobs preserve correct 
ordering.
   
     3. **Inflate to record at read time.** When reading a data file, the 
file-level stamp is carried onto every key-value record produced from that 
file, making the snapshot id available to the merge layer without changing the 
data file format.
   
     4. **Conditional comparator injection.** The sort-merge reader uses the 
record-level snapshot id as primary tiebreaker only when the option is enabled 
— zero overhead for other tables.
   
   
     ### Implementation Plan
   
     Will be split into 3 PRs:
   
     1. **Infrastructure:** Add `_COMMIT_SNAPSHOT_ID` nullable field to 
`DataFileMeta` and `KeyValue`, bump `DataSplit` / `CommitMessageSerializer` 
versions with legacy serializer. Pure plumbing, no behavior change.
   
     2. **Write path:** Add `sequence.snapshot-ordering` option with 
validation, stamp files at commit time in `FileStoreCommitImpl`, propagate max 
snapshot id through compaction.
   
     3. **Read path:** Inject snapshot-id tiebreaker into sort-merge reader 
comparators (conditional on option), add unit and integration tests.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature] Support snapshot-based sequence ordering for primary-key tables [paimon]

Reply via email to