[PR] [flink] Expose scan.bucket for single-bucket manifest pruning [paimon]

via GitHub Thu, 04 Jun 2026 00:39:40 -0700


wwj6591812 opened a new pull request, #8117:
URL: https://github.com/apache/paimon/pull/8117


   ## Background
   
   ReadBuilder.withBucket(int) and manifest scanning already support reading a 
single bucket, but Flink SQL had no connector option to expose it. Operators 
often need to debug or scan one bucket of a fixed-bucket primary-key table 
without reading all buckets.
   
   ## Why this PR
   
   Expose scan.bucket in Flink so users can run:
   
     SELECT * FROM t /*+ OPTIONS('scan.bucket' = '0') */
   
   and plan splits only for that bucket.
   
   ## What changes
   
   - Add FlinkConnectorOptions.SCAN_BUCKET (scan.bucket).
   - ScanBucketUtils.applyScanBucket() reads the option and calls 
ReadBuilder.withBucket().
   - Wire into FlinkSourceBuilder and FlinkTableSource (batch and split 
inference).
   - Validate in ReadBuilderImpl.withBucket() (canonical read path): 
non-negative bucket id, FileStoreTable only, not postpone-bucket mode, bucket < 
table.bucket when table bucket > 0.
   
   Stage optimized: scan / manifest planning — fewer manifest entries and 
splits before read. No change to merge or per-record logic.
   
   ## Tests
   
   - ScanBucketUtilsTest — invalid bucket id fails fast.
   - ScanBucketITCase — SQL with scan.bucket matches reading that bucket via 
the table API.
   
   ## Test plan
   
   - [ ] mvn test -pl paimon-flink/paimon-flink-common -am 
-Dtest=ScanBucketUtilsTest,ScanBucketITCase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [flink] Expose scan.bucket for single-bucket manifest pruning [paimon]

Reply via email to