nuno-faria commented on PR #19316:
URL: https://github.com/apache/datafusion/pull/19316#issuecomment-4103724239
@alamb I refactored the previous auto_explain mode to now use a new
`PlanObserver` trait. It has a method to be called when the physical plan is
built and another to be called when the plan completes, using the result of the
analyze operator.
```rust
pub trait PlanObserver: Send + Sync + 'static + Debug {
fn plan_created(
&self,
id: &str,
logical_plan: &LogicalPlan,
physical_plan: &Arc<dyn ExecutionPlan>,
) -> Result<()>;
fn plan_executed(
&self,
id: &str,
explain_result: RecordBatch,
duration: Duration,
) -> Result<()>;
}
```
The `AnalyzeExec` operator can now also receive a callback to pass the
result. I opted for this approach to avoid the physical operators to depend on
the `PlanOperator`. This way we can reuse the code in the analyze operator to
provide the already formatted output.
It can be used like this:
```rust
let plan_observer = DefaultPlanObserver::new("auto_explain.txt".to_owned(),
0);
let ctx = SessionContext::new().with_plan_observer(Arc::new(plan_observer));
ctx.sql("create table t (k int, v int)").await?.collect().await?;
// auto explain needs to be enabled
ctx.sql("set datafusion.explain.auto_explain =
true").await?.collect().await?;
ctx.sql("select * from t where k = 1 or k = 2 order by v desc limit
5").await?.collect().await?;
```
The `DefaultPlanObserver` writes using the `log` crate or a file, and it
looks like this:
```sql
QUERY: SELECT t.k, t.v FROM t WHERE ((t.k = 1) OR (t.k = 2)) ORDER BY t.v
DESC NULLS FIRST LIMIT 5
DURATION: 0.689ms
EXPLAIN:
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: TopK(fetch=5), expr=[v@1 DESC],
preserve_partitioning=[false], metrics=[output_rows=0, elapsed_compute=13.40µs,
output_bytes=0.0 B, output_batches=0, row_replacements=0] |
| | FilterExec: k@0 = 1 OR k@0 = 2,
metrics=[output_rows=0, elapsed_compute=1ns, output_bytes=0.0 B,
output_batches=0, selectivity=N/A (0/0)]
|
| | DataSourceExec: partitions=1, partition_sizes=[0],
metrics=[]
|
| |
|
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
(If the `sql` feature is not enabled, the SQL query is not written.)
Let me know what you think about the API.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]