[PR] [python] Implement first-row merge engine for read path [paimon]

via GitHub Mon, 25 May 2026 20:00:49 -0700


JunRuiLee opened a new pull request, #7968:
URL: https://github.com/apache/paimon/pull/7968


   ### Purpose
   
   Add read-path support for the `first-row` merge engine in pypaimon. The 
first-row engine keeps only the earliest row per primary key, which is the 
opposite of the default `deduplicate` engine that keeps the latest.
   
   Previously, reading a table configured with `merge-engine: first-row` raised 
`NotImplementedError`. This PR implements the merge function and wires it into 
the read pipeline.
   
   ### Changes
   
   - Add `FirstRowMergeFunction` that snapshots the first add-type row per key, 
with `ignore-delete` support matching Java semantics
   - Add `CoreOptions.ignore_delete()` with fallback key resolution 
(`ignore-delete`, `first-row.ignore-delete`, `deduplicate.ignore-delete`, 
`partial-update.ignore-delete`)
   - Update `merge_engine_support.check_supported()` and 
`MergeFileSplitRead._build_merge_function()` to accept and dispatch `FIRST_ROW`
   - Update existing `test_first_row_engine_raises_not_implemented` to verify 
correct first-row behavior instead
   
   ### Testing
   
   - 11 unit tests covering: first-row-wins semantics, reset/reuse, 
ignore-delete on/off, retract rejection, KeyValue reuse safety
   - 4 end-to-end tests: single/multi-key writes, three-write override, 
single-write passthrough


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [python] Implement first-row merge engine for read path [paimon]

Reply via email to