nuno-faria commented on PR #19316:
URL: https://github.com/apache/datafusion/pull/19316#issuecomment-4103724239

   @alamb I refactored the previous auto_explain mode to now use a new 
`PlanObserver` trait. It has a method to be called when the physical plan is 
built and another to be called when the plan completes, using the result of the 
analyze operator.
   
   ```rust
   pub trait PlanObserver: Send + Sync + 'static + Debug {
       fn plan_created(
           &self,
           id: &str,
           logical_plan: &LogicalPlan,
           physical_plan: &Arc<dyn ExecutionPlan>,
       ) -> Result<()>;
       
       fn plan_executed(
           &self,
           id: &str,
           explain_result: RecordBatch,
           duration: Duration,
       ) -> Result<()>;
   }
   ```
   
   The `AnalyzeExec` operator can now also receive a callback to pass the 
result. I opted for this approach to avoid the physical operators to depend on 
the `PlanOperator`. This way we can reuse the code in the analyze operator to 
provide the already formatted output.
   
   It can be used like this:
   
   ```rust
   let plan_observer = DefaultPlanObserver::new("auto_explain.txt".to_owned(), 
0);
   let ctx = SessionContext::new().with_plan_observer(Arc::new(plan_observer));
   ctx.sql("create table t (k int, v int)").await?.collect().await?;
   
   // auto explain needs to be enabled
   ctx.sql("set datafusion.explain.auto_explain = 
true").await?.collect().await?;
   
   ctx.sql("select * from t where k = 1 or k = 2 order by v desc limit 
5").await?.collect().await?;
   ```
   
   The `DefaultPlanObserver` writes using the `log` crate or a file, and it 
looks like this:
   
   ```sql
   QUERY: SELECT t.k, t.v FROM t WHERE ((t.k = 1) OR (t.k = 2)) ORDER BY t.v 
DESC NULLS FIRST LIMIT 5
   DURATION: 0.689ms
   EXPLAIN:
   
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                             |
   
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | SortExec: TopK(fetch=5), expr=[v@1 DESC], 
preserve_partitioning=[false], metrics=[output_rows=0, elapsed_compute=13.40µs, 
output_bytes=0.0 B, output_batches=0, row_replacements=0] |
   |                   |   FilterExec: k@0 = 1 OR k@0 = 2, 
metrics=[output_rows=0, elapsed_compute=1ns, output_bytes=0.0 B, 
output_batches=0, selectivity=N/A (0/0)]                                        
 |
   |                   |     DataSourceExec: partitions=1, partition_sizes=[0], 
metrics=[]                                                                      
                                             |
   |                   |                                                        
                                                                                
                                             |
   
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   (If the `sql` feature is not enabled, the SQL query is not written.)
   
   Let me know what you think about the API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to