TheR1sing3un opened a new pull request, #7869:
URL: https://github.com/apache/paimon/pull/7869

   ## What
   
   Add `ReadBuilder.explain()` returning a structured `ExplainResult` so users
   can see what a PyPaimon read will actually do — target snapshot, pushed-down
   predicate / projection / limit, partition / bucket / file-stats pruning
   funnel, and split-level execution signals (raw-convertible ratio, deletion-
   vector ratio, level histogram, split-size skew).
   
   The default `__str__` is a compact debug layout; `verbose=True` lists every
   split. Reads manifest list + manifests only — data files are never opened.
   
   ## Why
   
   `Plan` exposes only `splits` and `snapshot_id` today; `FileScanner` already
   does partition / bucket / file-stats pruning but none of that is visible to
   users. The only way to inspect cost is reading INFO logs or walking
   `plan().splits()` by hand. Apache Paimon Java has no SQL EXPLAIN of its own
   either (that comes from Flink / Spark); this PR is scoped to scan-plan
   visibility, not query planning.
   
   ## Sample output
   
   PK + partition + HASH_FIXED bucket, predicate `dt = '2026-05-12' AND id = 7`:
   
   ```
   == PyPaimon Scan Plan ==
   Table:              default.demo (PK, HASH_FIXED)
   Snapshot:           5  (schema 0)
   Predicate:          (dt = '2026-05-12') AND (id = 7)
   Projection:         [dt, id, val]
   Limit:              100
   
   Partition pruning:  20 -> 4  (pruned 16)
   Bucket pruning:     4 -> 1  (pruned 3)
   File skipping:      1 -> 1  (pruned 0)
   
   Splits:             1
     raw-convertible:  1 / 1
     with DV:          0 / 1
     all-above-L0:     0 / 1
     files/split:      min=1  max=1  avg=1.00
     size/split:       min=2.6 KiB  p50=2.6 KiB  p95=2.6 KiB  max=2.6 KiB
   
   Files:              1
   Total size:         2.6 KiB
   Estimated rows:     10   (merged: 10)
   Level histogram:    L0=1
   Deletion files:     0
   ```
   
   ## Tests
   
   `pypaimon/tests/read_builder_explain_test.py` covers 7 scenarios:
   append-only baseline, PK/partition/bucket pruning funnel, predicate
   rendering, verbose splits, empty snapshot, split-level signals, pretty-print
   smoke. Full read regression is clean.
   
   ## API / format impact
   
   New API only: `ReadBuilder.explain(verbose=False) -> ExplainResult`. Hot
   read path untouched — `ScanStats` is opt-in and only enabled by `explain()`.
   No data / wire format change. No Java-side change.
   
   ## Follow-up
   
   A follow-up patch will surface `explain` through the pypaimon CLI
   (alongside `cli_sql` / `cli_table`) so users can inspect a query plan from
   the command line without writing any Python. A `# TODO` next to
   `ReadBuilder.explain` marks the entry point.
   
   ## Generative AI usage
   
   Drafted with the help of Claude Code; reviewed and tested locally by the
   author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to