LantaoJin opened a new pull request, #97:
URL: https://github.com/apache/datafusion-java/pull/97

   ## Which issue does this PR close?
   
   - Closes #96 .
   
   ## Rationale for this change
   
   `df.explain(true, true)` already runs the plan and attaches per-operator 
metrics, but the result is a `DataFrame` of text rows. Programmatic consumers — 
query-shape regression tests, operational audit feeds, build-time benchmarks — 
have to scrape `output_rows=12345, elapsed_compute=4.2ms` strings out of those 
rows. Brittle to upstream wording, ergonomically painful, and the typed metric 
values (`Count`, `Time`, `Gauge`) lose their type along the way.
   
   This PR adds a typed accessor `df.executedPlan()` that returns an immutable 
`ExecutedPlan` tree once the DataFrame has been executed via `collect()` / 
`executeStream()`. Each node carries the operator name, a one-line display 
rendering, child nodes, and an `OperatorMetrics` POJO with `OptionalLong` 
fields for the well-known metric variants plus a `Map<String, Long>` for custom 
counters.
   
   ```java
   try (DataFrame df = ctx.sql("SELECT count(*) FROM events");
        ArrowReader r = df.collect(allocator)) {
       while (r.loadNextBatch()) { /* drain */ }
   }
   ExecutedPlan plan = df.executedPlan();
   long rows = plan.metrics().outputRows().orElse(-1L);
   ```
   
   The contract is post-mortem: `executedPlan()` requires a prior `collect` / 
`executeStream` and rejects with `IllegalStateException("call collect() or 
executeStream() first")` if called pre-execution. A future PR can extend the 
surface to make pre-execution structure inspection available too — that 
follow-up is intentionally out of scope here to keep this PR focused on the 
metric-snapshot surface.
   
   ## What changes are included in this PR?
   
   - New public records `ExecutedPlan` and `OperatorMetrics`.
   - New `DataFrame.executedPlan()` method.
   - New `proto/executed_plan.proto` (`ExecutedPlanNodeProto`).
   - Native side: `executed_plan.rs`.
   - Java-side: one new `final long planId` field assigned at construction.
   
   Out of scope (deferred to follow-up PRs):
   
   - Per-partition metric breakdown.
   - `Time`/`Gauge`-shaped custom metrics; v1 surfaces `Count`-shaped customs 
only.
   
   ## Are these changes tested?
   
   Yes. 10 new tests in the `ExecutedPlanTest`.
   
   ## Are there any user-facing changes?
   
   Yes, additive only -- no behavior changes for existing callers.
   
   - New public types `ExecutedPlan` and `OperatorMetrics` (records).
   - New `DataFrame.executedPlan()` method.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to