TheR1sing3un opened a new pull request, #7869:
URL: https://github.com/apache/paimon/pull/7869
## What
Add `ReadBuilder.explain()` returning a structured `ExplainResult` so users
can see what a PyPaimon read will actually do — target snapshot, pushed-down
predicate / projection / limit, partition / bucket / file-stats pruning
funnel, and split-level execution signals (raw-convertible ratio, deletion-
vector ratio, level histogram, split-size skew).
The default `__str__` is a compact debug layout; `verbose=True` lists every
split. Reads manifest list + manifests only — data files are never opened.
## Why
`Plan` exposes only `splits` and `snapshot_id` today; `FileScanner` already
does partition / bucket / file-stats pruning but none of that is visible to
users. The only way to inspect cost is reading INFO logs or walking
`plan().splits()` by hand. Apache Paimon Java has no SQL EXPLAIN of its own
either (that comes from Flink / Spark); this PR is scoped to scan-plan
visibility, not query planning.
## Sample output
PK + partition + HASH_FIXED bucket, predicate `dt = '2026-05-12' AND id = 7`:
```
== PyPaimon Scan Plan ==
Table: default.demo (PK, HASH_FIXED)
Snapshot: 5 (schema 0)
Predicate: (dt = '2026-05-12') AND (id = 7)
Projection: [dt, id, val]
Limit: 100
Partition pruning: 20 -> 4 (pruned 16)
Bucket pruning: 4 -> 1 (pruned 3)
File skipping: 1 -> 1 (pruned 0)
Splits: 1
raw-convertible: 1 / 1
with DV: 0 / 1
all-above-L0: 0 / 1
files/split: min=1 max=1 avg=1.00
size/split: min=2.6 KiB p50=2.6 KiB p95=2.6 KiB max=2.6 KiB
Files: 1
Total size: 2.6 KiB
Estimated rows: 10 (merged: 10)
Level histogram: L0=1
Deletion files: 0
```
## Tests
`pypaimon/tests/read_builder_explain_test.py` covers 7 scenarios:
append-only baseline, PK/partition/bucket pruning funnel, predicate
rendering, verbose splits, empty snapshot, split-level signals, pretty-print
smoke. Full read regression is clean.
## API / format impact
New API only: `ReadBuilder.explain(verbose=False) -> ExplainResult`. Hot
read path untouched — `ScanStats` is opt-in and only enabled by `explain()`.
No data / wire format change. No Java-side change.
## Follow-up
A follow-up patch will surface `explain` through the pypaimon CLI
(alongside `cli_sql` / `cli_table`) so users can inspect a query plan from
the command line without writing any Python. A `# TODO` next to
`ReadBuilder.explain` marks the entry point.
## Generative AI usage
Drafted with the help of Claude Code; reviewed and tested locally by the
author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]