wwj6591812 opened a new pull request, #8117:
URL: https://github.com/apache/paimon/pull/8117
## Background
ReadBuilder.withBucket(int) and manifest scanning already support reading a
single bucket, but Flink SQL had no connector option to expose it. Operators
often need to debug or scan one bucket of a fixed-bucket primary-key table
without reading all buckets.
## Why this PR
Expose scan.bucket in Flink so users can run:
SELECT * FROM t /*+ OPTIONS('scan.bucket' = '0') */
and plan splits only for that bucket.
## What changes
- Add FlinkConnectorOptions.SCAN_BUCKET (scan.bucket).
- ScanBucketUtils.applyScanBucket() reads the option and calls
ReadBuilder.withBucket().
- Wire into FlinkSourceBuilder and FlinkTableSource (batch and split
inference).
- Validate in ReadBuilderImpl.withBucket() (canonical read path):
non-negative bucket id, FileStoreTable only, not postpone-bucket mode, bucket <
table.bucket when table bucket > 0.
Stage optimized: scan / manifest planning — fewer manifest entries and
splits before read. No change to merge or per-record logic.
## Tests
- ScanBucketUtilsTest — invalid bucket id fails fast.
- ScanBucketITCase — SQL with scan.bucket matches reading that bucket via
the table API.
## Test plan
- [ ] mvn test -pl paimon-flink/paimon-flink-common -am
-Dtest=ScanBucketUtilsTest,ScanBucketITCase
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]