LantaoJin opened a new pull request, #97:
URL: https://github.com/apache/datafusion-java/pull/97
## Which issue does this PR close?
- Closes #96 .
## Rationale for this change
`df.explain(true, true)` already runs the plan and attaches per-operator
metrics, but the result is a `DataFrame` of text rows. Programmatic consumers —
query-shape regression tests, operational audit feeds, build-time benchmarks —
have to scrape `output_rows=12345, elapsed_compute=4.2ms` strings out of those
rows. Brittle to upstream wording, ergonomically painful, and the typed metric
values (`Count`, `Time`, `Gauge`) lose their type along the way.
This PR adds a typed accessor `df.executedPlan()` that returns an immutable
`ExecutedPlan` tree once the DataFrame has been executed via `collect()` /
`executeStream()`. Each node carries the operator name, a one-line display
rendering, child nodes, and an `OperatorMetrics` POJO with `OptionalLong`
fields for the well-known metric variants plus a `Map<String, Long>` for custom
counters.
```java
try (DataFrame df = ctx.sql("SELECT count(*) FROM events");
ArrowReader r = df.collect(allocator)) {
while (r.loadNextBatch()) { /* drain */ }
}
ExecutedPlan plan = df.executedPlan();
long rows = plan.metrics().outputRows().orElse(-1L);
```
The contract is post-mortem: `executedPlan()` requires a prior `collect` /
`executeStream` and rejects with `IllegalStateException("call collect() or
executeStream() first")` if called pre-execution. A future PR can extend the
surface to make pre-execution structure inspection available too — that
follow-up is intentionally out of scope here to keep this PR focused on the
metric-snapshot surface.
## What changes are included in this PR?
- New public records `ExecutedPlan` and `OperatorMetrics`.
- New `DataFrame.executedPlan()` method.
- New `proto/executed_plan.proto` (`ExecutedPlanNodeProto`).
- Native side: `executed_plan.rs`.
- Java-side: one new `final long planId` field assigned at construction.
Out of scope (deferred to follow-up PRs):
- Per-partition metric breakdown.
- `Time`/`Gauge`-shaped custom metrics; v1 surfaces `Count`-shaped customs
only.
## Are these changes tested?
Yes. 10 new tests in the `ExecutedPlanTest`.
## Are there any user-facing changes?
Yes, additive only -- no behavior changes for existing callers.
- New public types `ExecutedPlan` and `OperatorMetrics` (records).
- New `DataFrame.executedPlan()` method.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]