yugeeklab opened a new pull request, #8206: URL: https://github.com/apache/paimon/pull/8206
### Purpose Linked issue: close #8204 With deletion vectors enabled, delete records are dropped from compaction output at any non-zero output level, so the deletion of a key only lives in the deletion vector of the file holding the old row. `LookupLevels` caches lookup files per data file name and freezes the deletion state of build time — data files are immutable but their deletion vectors are not — so a cached lookup file can keep serving a row that has since been marked deleted. Such a stale hit corrupts every consumer of the lookup. In particular the lookup changelog producer uses it as the changelog BEFORE image: a re-insert with content identical to the pre-delete row (modulo `changelog-producer.row-deduplicate-ignore-fields`) is judged "no change" and produces no changelog, although a `-D` was already emitted by an earlier compaction. Downstream CDC consumers end up permanently diverged: the table holds a live row while the changelog stream says it was deleted. This PR validates the hit's position against the current deletion vector before returning it from `LookupLevels.lookup`. A deleted hit means the newest version of the key in the searched levels is gone; deeper levels only hold older versions, so the key is reported as absent rather than continuing the search. Hits without position information (value-only processors) keep the previous behaviour. ### Tests `LookupLevelsTest#testLookupRespectsDeletionVectorUpdates` exercises the real lookup-file cache: 1. control: lookup returns the live row and warms the cache, 2. a deletion of an unrelated position in the same file does not affect the live row, 3. after marking the returned position deleted (cache not rebuilt), the same lookup returns null. The test fails without the fix and passes with it. Full paimon-core suite: 0 failures (remaining errors are environmental — docker-dependent and a pre-existing JDK/Hadoop `Subject.getSubject` incompatibility, identical on master). 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
