[PR] feat(table/write): merge partial-update rows at flush [paimon-rust]

via GitHub Thu, 11 Jun 2026 10:02:12 -0700


TheR1sing3un opened a new pull request, #380:
URL: https://github.com/apache/paimon-rust/pull/380


   ### Purpose
   
   The partial-update writer kept every row of a key in the flushed data file, 
deferring all merging to the read side. Java's 
`MergeTreeWriter#flushWriteBuffer` runs the merge function over the write 
buffer before flushing, so a data file never holds two rows of one key — an 
invariant that split planning and statistics rely on (a file's physical row 
count equals its logical row count).
   
   The missing invariant surfaced in 
https://github.com/apache/paimon-rust/pull/374#discussion_r3396440436: a 
single-file partial-update split marked raw convertible reported its physical 
row count as exact, inflating `COUNT(*)` through exact scan statistics and 
starving LIMIT pushdown. PR #374 works around it by keeping partial-update 
splits non-raw-convertible; this PR fixes the root cause so that gating can be 
relaxed in a follow-up.
   
   ### Brief change log
   
   - `KeyValueFileWriter::flush` merges each key group down to one row for 
`merge-engine=partial-update`, mirroring Java 
`MergeTreeWriter#flushWriteBuffer` with the same semantics as the read-side 
`PartialUpdateMergeFunction`:
     - every column keeps its latest non-null value ordered by (sequence 
fields, system sequence); an all-null column stays null
     - the merged row carries the group's highest sequence number
     - DELETE / UPDATE_BEFORE rows are rejected, matching the read-side error
   - Changelog files (`changelog-producer=input`) still record the pre-merge 
rows, matching Java's `rawConsumer`.
   - Deduplicate / first-row flush behavior is unchanged; the key-grouping 
helper is now shared.
   - Cross-commit merging is unchanged: files from different commits still 
overlap on key range and go through the sort-merge reader.
   
   ### Tests
   
   - `test_merge_partial_update_rows_latest_non_null_per_column`: per-column 
latest-non-null across a key group, all-null column stays null, merged 
`_SEQUENCE_NUMBER` is the group max
   - `test_merge_partial_update_rows_rejects_retract`: DELETE rows error at 
flush like the read side
   - e2e `test_pk_partial_update_merges_within_single_commit`: three partial 
updates of one key in one INSERT produce a single physical row — SELECT and 
COUNT(*) agree
   - existing partial-update e2e (cross-commit field-wise merge) unchanged and 
green
   
   ### API and Format
   
   No API change. Data files written for partial-update tables now contain one 
row per key per flush (same physical schema); files written by older versions 
keep working — the reader still sort-merges every split.
   
   ### Documentation
   
   No documentation change needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(table/write): merge partial-update rows at flush [paimon-rust]

Reply via email to