yugeeklab opened a new issue, #8204:
URL: https://github.com/apache/paimon/issues/8204

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   ### Paimon version
   
   master (1.5-SNAPSHOT), also present in release-1.4
   
   ### Compute Engine
   
   Observed with Flink writer + Spark streaming consumer, but the bug is 
engine-independent (paimon-core compaction path).
   
   ### Minimal reproduce step
   
   Unit test in the linked PR reproduces it deterministically: a lookup hit 
whose position is already marked deleted in the current deletion vector is used 
as the changelog BEFORE image; a re-insert with identical content then produces 
no changelog at all (fails without the fix, passes with it).
   
   Conceptual sequence on a table with `changelog-producer = lookup`, 
`deletion-vectors.enabled = true`, `changelog-producer.row-deduplicate = true`:
   
   1. Key K exists with content C (row in a high-level file F, alive).
   2. K is deleted. Compaction window 1 emits `-D` (correct), marks F's row 
position in the deletion vector, and **drops the delete record from the 
output** (with DVs enabled `dropDelete` is true for any non-zero output level, 
see `MergeTreeCompactManager`).
   3. K is re-created with the same content C (only fields listed in 
`row-deduplicate-ignore-fields` differ).
   4. Compaction window 2: `pickHighLevel` finds nothing (the tombstone was 
dropped). The lookup is served by a **cached lookup file built before the DV 
update** (`LookupLevels` caches per data file name; data files are immutable, 
so the cache is never rebuilt and the only invalidation hook is file drop). It 
returns the pre-delete row C as BEFORE.
   5. `LookupChangelogMergeFunctionWrapper#setChangelog`: BEFORE=C (add), 
AFTER=C (add), `valueEqualiser.equals` is true → **no changelog emitted**.
   
   Net changelog stream for K: `... , -D` while the table holds a live row — 
downstream CDC consumers are permanently diverged and there is no later event 
that repairs them.
   
   ### What doesn't meet your expectations?
   
   The re-insert in step 3 must emit `+I` (or `-U/+U`), because a `-D` was 
already emitted for the same key in an earlier compaction. Row-level dedup 
compares against the stale pre-delete row instead of the current (deleted) 
state.
   
   Observed in production on a ~300k-key table with periodic delete/re-create 
churn: a steady drip of keys whose changelog ends with `-D` while the table row 
is alive. Verified by reading `$audit_log` with `incremental-between` over the 
suspect window together with a control rowkind count (control non-empty, 
suspect keys zero events).
   
   ### Anything else?
   
   Proposed fix (PR follows): validate lookup hits against the current deletion 
vector before returning them from `LookupLevels` — positions are already 
available via `PositionedKeyValue`/`FilePosition`. A deleted hit is reported as 
absent; deeper levels only hold older versions of the key, so continuing the 
search would be wrong as well.
   
   Related: `LocalTableQuery` carries a `// TODO pass DeletionVector factory`, 
the same integration gap on the read path.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to