TheR1sing3un opened a new pull request, #7744:
URL: https://github.com/apache/paimon/pull/7744
## Purpose
Today, on a HASH_FIXED PK table with N buckets, a point query like
```python
table.new_read_builder().with_filter(pb.equal('id',
'X')).new_scan().plan().splits()
```
scans every bucket — N times the manifest decoding, N times the file open.
Java's `org.apache.paimon.operation.BucketSelectConverter` solves this at the
manifest-entry filter layer: an Equal/In predicate on the bucket-key columns
derives a finite set of buckets the query can possibly hit, and entries outside
that set are skipped before any stats decoding.
This PR ports the same pattern to pypaimon. PK point queries now touch a
single bucket; PK `IN` queries touch only the buckets covering the literal set.
## Mechanism
`pypaimon/read/scanner/bucket_select_converter.py` (new):
* Walk the predicate, isolate AND clauses that constrain bucket-key fields
with `equal` / `in`, take the cartesian product of literal values (capped at
`MAX_VALUES=1000`), hash each combination using `_hash_bytes_by_words` /
`_bucket_from_hash` from `pypaimon.write.row_key_extractor`, and return a
callable `selector(bucket, total_buckets) -> bool`.
* Cached per `total_buckets` — handles the rescale case where bucket count
varies between manifest entries.
Conservative scope, deliberately narrower than Java's general flexibility:
* Only HASH_FIXED tables (the caller is responsible for gating).
* All bucket-key fields must be constrained — otherwise `None`, full scan.
* Repeated constraints on the same column under top-level AND (`id = 1 AND
id = 2`) → `None`. Java does the same rather than reasoning about
unsatisfiability.
* Cartesian product cap at `MAX_VALUES = 1000` — above that, fall back to
full scan.
`pypaimon/read/scanner/file_scanner.py`:
* Enable the selector only when `table.bucket_mode() ==
BucketMode.HASH_FIXED`.
* Derive the bucket-key fields by instantiating the writer's
`FixedBucketRowKeyExtractor` and reading `_bucket_key_fields`. Reusing the
writer class — rather than re-implementing bucket-key resolution on the read
side — is the **safety net** against future write-side resolution changes that
would otherwise silently break read/write hash agreement and lose data.
* Apply the selector inside `_filter_manifest_entry`, after the bucket
validity check and before partition / stats decoding, so it short-circuits the
hot path on point queries.
## Soundness
The selector returns a **superset** of the buckets containing matching rows.
False-positive (over-keep) is fine; false-negative is silent data loss and must
never happen.
* `total_buckets <= 0` (legacy / special manifest entries that the writer
placed under a different convention) → fail open.
* Any hashing / serialization error inside the deferred hash — e.g.
`pb.equal('id_bigint', 'foo')` so `GenericRowSerializer.to_bytes` raises
`struct.error` mid-scan — is caught and the selector fails open. Crashing the
entire scan with an opaque error would be a worse user experience than silently
skipping pruning, and the soundness contract is preserved.
## Linked issue
N/A — surfaced when running PK point lookups against tables with non-trivial
bucket counts and seeing every bucket in the resulting splits list.
## Tests
New `pypaimon/tests/pushdown_bucket_test.py`, three layers, 25 cases:
* **Layer 1 — Unit (17 cases)**: direct `create_bucket_selector` calls —
Equal / In / OR-of-Equals / composite-key cartesian / unconstrained-key returns
None / non-bucket-key returns None / range returns None / OR with
non-bucket-key returns None / repeated AND on same key returns None / unrelated
AND clause is unaffected / cartesian above cap returns None / null-only literal
collapses to empty bucket set / no-predicate returns None / no-bucket-keys
returns None / per-`total_buckets` cache (rescale) / `total_buckets <= 0` fails
open / type-mismatched literal fails open.
* **Layer 2 — Integration (5 cases)**: real PK tables (8 buckets), public
API. Asserts BOTH result correctness AND that pruning fired (`{s.bucket for s
in splits}` has the expected size). Includes a "selector must be None"
assertion for the value-only predicate so a buggy selector that prunes wrongly
but happens to keep the test rows still fails.
* **Layer 3 — Property (60 random trials, deterministic seed)**: random
bucket counts × random PKs × random Equal/In, result == oracle. Uses seeded
`random.Random` rather than hypothesis so the PR doesn't introduce a new dev
dependency and the test stays Python 3.6 compatible.
Local: `pytest pypaimon/tests/pushdown_bucket_test.py` → 25 passed;
surrounding suites (`predicates_test`, `reader_split_generator_test`,
`reader_primary_key_test`, `identifier_test`) → 73 passed, 2 failed
(pre-existing lance environment issues unrelated to this PR); `flake8
--config=dev/cfg.ini` clean.
## API and format
No public API change. No file format change. Read result is unchanged for
any predicate; only the set of manifest entries actually decoded shrinks for
predicates the selector can match.
## Documentation
Inline docstrings on `create_bucket_selector`, `_Selector`, and
`FileScanner._init_bucket_selector` document the selector contract (superset
semantics, fail-open conditions, why bucket-key fields come from the writer's
extractor) so future maintainers don't accidentally regress soundness.
## Generative AI disclosure
Drafted with assistance from an AI coding tool; the design follows
`org.apache.paimon.operation.BucketSelectConverter` and the soundness contract
is exercised end-to-end by the three-layer test suite.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]