JunRuiLee opened a new pull request, #7948: URL: https://github.com/apache/paimon/pull/7948
## Motivation In some production shared-dataset deployments, compaction is separated from writers and runs as dedicated jobs. For DV primary-key tables, batch scans skip level-0 files by default, so newly written uncompacted data is not visible until compaction finishes. This means readers depend on compaction progress to see the latest data. `visibility-callback.enabled` does not fit this requirement because it also relies on compaction: commits are returned only after compaction makes the data visible through the optimized read path. In this scenario, users want to read the latest committed data without waiting for compaction. ## Changes This PR adds a new option `deletion-vectors.merge-on-read` for DV tables. When enabled, batch scans include DV level-0 files and merge them at read time, so readers can see uncompacted committed data without depending on compaction. This trades read performance for fresher batch results. This option only affects batch scan visibility of DV level-0 files. It does not change streaming scan or changelog behavior. - Add `deletion-vectors.merge-on-read` core option (default `false`). - Use it to control whether DV batch scans skip level-0 files (`CoreOptions.batchScanSkipLevel0()`). - Validate mutual exclusion with `visibility-callback.enabled` in `SchemaValidation`. - Add core-level tests for batch read behavior with and without merge-on-read. - Add Flink IT test for merge-on-read with write-only mode. - Update generated config documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
