ShreyeshArangath opened a new issue, #1379:
URL: https://github.com/apache/datafusion-python/issues/1379
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
DataFusion Python currently provides execution metrics only through the
`explain(analyze=True)` output, which displays metrics as formatted console
text. There is no structured Python API to programmatically access per-operator
metrics such as `output_rows`, `elapsed_compute`, `spill_count`, etc.
**Describe the solution you'd like**
Expose a structured Python API to access execution metrics after running a
query:
```py
from datafusion import SessionContext, collect_metrics
ctx = SessionContext()
df = ctx.sql("SELECT * FROM table WHERE value > 100")
plan = df.execution_plan()
plan.execute_collect(ctx)
# Access metrics on the plan
metrics = plan.metrics()
print(f"Rows: {metrics.output_rows}")
print(f"CPU time: {metrics.elapsed_compute} ns")
for operator_name, operator_metrics in collect_metrics(plan):
print(f"{operator_name}: {operator_metrics.output_rows} rows")
```
**Describe alternatives you've considered**
N/A
**Additional context**
This mirrors the existing Rust API in datafusion::physical_plan::metrics and
makes it accessible from Python. The metrics would only be populated after
execution, matching DataFusion's semantics where metrics are collected during
query execution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]