TheR1sing3un commented on PR #7745:
URL: https://github.com/apache/paimon/pull/7745#issuecomment-4358389651
> Nit: thanks for adding the coverage. I wonder if you can make it a bit
more focused though — some unit and e2e cases seem to overlap on the same
last-non-null semantics, and the unsupported-option cases could probably be
table-driven.
Verified -- and the gap is independent of this PR.
I added two coverage cases per your suggestion (commit 0d58859d0):
`test_partial_update_two_write_arrows_single_commit` /
`test_partial_update_three_write_arrows_single_commit`. Both fail, so
they're marked `unittest.expectedFailure` for now.
To rule out anything this PR introduced, I ran the same workload on
plain `origin/master` (b4e54ada3) using the **default `deduplicate`**
merge engine -- two `write_arrow` calls + one `prepare_commit` with
the same PK in both batches:
```
w.write_arrow(pa.Table.from_pylist([{'id': 1, 'a': 'first', 'b': 'old'}],
...))
w.write_arrow(pa.Table.from_pylist([{'id': 1, 'a': 'second', 'b': 'new'}],
...))
c.commit(w.prepare_commit())
# Expected: 1 row -- {'id': 1, 'a': 'second', 'b': 'new'} (dedupe to
latest)
# Actual on master: 2 rows -- {'id': 1, 'a': 'first', ...}, {'id': 1, 'a':
'second', ...}
```
The flushed data file contains 2 rows for the same PK; on read,
`_build_split_from_pack` (split_generator.py:99-100) marks any
single-file PK split as `raw_convertible=True`, which routes through
the raw-convertible fast path and skips `SortMergeReader`. Even on
master the dedupe path silently returns both rows -- so what this PR
exposes is a pre-existing user-facing data-quality issue that affects
the default PK-table configuration. It's not a regression introduced
by the partial-update dispatch.
Root cause split across the write and read paths:
* `KeyValueDataWriter._merge_data` only does `concat + sort` -- no
merge function is applied at flush time, so the file holds duplicate
primary keys (violates the LSM "PK unique within a file" invariant
that Java enforces in `SortBufferWriteBuffer.forEach` /
`MergeTreeWriter.flushWriteBuffer`).
* The read-side `raw_convertible` shortcut assumes that invariant
holds, so it skips `SortMergeReader` -- where the merge-engine
dispatch (this PR) and the existing dedupe both live.
Fixing it requires either (a) an in-memory merge buffer in
`KeyValueDataWriter` mirroring Java's `SortBufferWriteBuffer` (which
applies the table's `MergeFunction` during flush -- the proper fix,
also benefits dedupe/aggregation/first-row), or (b) a stricter
`raw_convertible` check that proves intra-file PK uniqueness. Both
are write-path / scan-path restructuring, well outside the read-side
scope of this PR. I'll open a separate PR for it -- the dedupe data-
quality issue is severe enough on its own to warrant a dedicated fix
rather than buried in this merge-engine port.
The expectedFailure cases here will turn into passing regressions
once that follow-up lands.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]