yugeeklab opened a new issue, #8204: URL: https://github.com/apache/paimon/issues/8204
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version master (1.5-SNAPSHOT), also present in release-1.4 ### Compute Engine Observed with Flink writer + Spark streaming consumer, but the bug is engine-independent (paimon-core compaction path). ### Minimal reproduce step Unit test in the linked PR reproduces it deterministically: a lookup hit whose position is already marked deleted in the current deletion vector is used as the changelog BEFORE image; a re-insert with identical content then produces no changelog at all (fails without the fix, passes with it). Conceptual sequence on a table with `changelog-producer = lookup`, `deletion-vectors.enabled = true`, `changelog-producer.row-deduplicate = true`: 1. Key K exists with content C (row in a high-level file F, alive). 2. K is deleted. Compaction window 1 emits `-D` (correct), marks F's row position in the deletion vector, and **drops the delete record from the output** (with DVs enabled `dropDelete` is true for any non-zero output level, see `MergeTreeCompactManager`). 3. K is re-created with the same content C (only fields listed in `row-deduplicate-ignore-fields` differ). 4. Compaction window 2: `pickHighLevel` finds nothing (the tombstone was dropped). The lookup is served by a **cached lookup file built before the DV update** (`LookupLevels` caches per data file name; data files are immutable, so the cache is never rebuilt and the only invalidation hook is file drop). It returns the pre-delete row C as BEFORE. 5. `LookupChangelogMergeFunctionWrapper#setChangelog`: BEFORE=C (add), AFTER=C (add), `valueEqualiser.equals` is true → **no changelog emitted**. Net changelog stream for K: `... , -D` while the table holds a live row — downstream CDC consumers are permanently diverged and there is no later event that repairs them. ### What doesn't meet your expectations? The re-insert in step 3 must emit `+I` (or `-U/+U`), because a `-D` was already emitted for the same key in an earlier compaction. Row-level dedup compares against the stale pre-delete row instead of the current (deleted) state. Observed in production on a ~300k-key table with periodic delete/re-create churn: a steady drip of keys whose changelog ends with `-D` while the table row is alive. Verified by reading `$audit_log` with `incremental-between` over the suspect window together with a control rowkind count (control non-empty, suspect keys zero events). ### Anything else? Proposed fix (PR follows): validate lookup hits against the current deletion vector before returning them from `LookupLevels` — positions are already available via `PositionedKeyValue`/`FilePosition`. A deleted hit is reported as absent; deeper levels only hold older versions of the key, so continuing the search would be wrong as well. Related: `LocalTableQuery` carries a `// TODO pass DeletionVector factory`, the same integration gap on the read path. ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
