JunRuiLee opened a new pull request, #7948:
URL: https://github.com/apache/paimon/pull/7948

   ## Motivation
   
   In some production shared-dataset deployments, compaction is separated from 
writers and runs as dedicated jobs. For DV primary-key tables, batch scans skip 
level-0 files by default, so newly written uncompacted data is not visible 
until compaction finishes. This means readers depend on compaction progress to 
see the latest data.
   
   `visibility-callback.enabled` does not fit this requirement because it also 
relies on compaction: commits are returned only after compaction makes the data 
visible through the optimized read path. In this scenario, users want to read 
the latest committed data without waiting for compaction.
   
   ## Changes
   
   This PR adds a new option `deletion-vectors.merge-on-read` for DV tables. 
When enabled, batch scans include DV level-0 files and merge them at read time, 
so readers can see uncompacted committed data without depending on compaction. 
This trades read performance for fresher batch results.
   
   This option only affects batch scan visibility of DV level-0 files. It does 
not change streaming scan or changelog behavior.
   
   - Add `deletion-vectors.merge-on-read` core option (default `false`).
   - Use it to control whether DV batch scans skip level-0 files 
(`CoreOptions.batchScanSkipLevel0()`).
   - Validate mutual exclusion with `visibility-callback.enabled` in 
`SchemaValidation`.
   - Add core-level tests for batch read behavior with and without 
merge-on-read.
   - Add Flink IT test for merge-on-read with write-only mode.
   - Update generated config documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to