alamb commented on pull request #7959:
URL: https://github.com/apache/arrow/pull/7959#issuecomment-674208715


   Thanks @jorgecarleitao 
   
   > #### Physical Plan?
   > Looking at CreateExternalTable, we do not construct a physical plan out of 
it. Wouldn't it be simpler if we do not create a physical plan, like we do for 
CreateExternalTable, since we do not need to multi-thread it nor run through 
partitions nor batches?
   
   Yes, you are right that we could special case explain not to actually run a 
physical plan. I was thinking there are some benefits to using a physical plan, 
as follows:
   
   1. Immediate support for all other input/output channels (not just the CLI) 
-- so for example if there is ever a client library for DataFusion, it will not 
need any additional logic to handle `EXPLAIN`
   2. Use / store the plans themselves as data ( imagine `INSERT INTO foo 
EXPLAIN select * from bar`)/  
   
   I admit these are both forward looking features with likely no practical 
import in the near term. However, I felt making `Explain` a first class part of 
both logical and physical plans would reduce special cases and make the code 
"cleaner" (though this is just my opinion)
   
   > #### Plans as {:#?}
   > If I read correctly, we currently return the result of the explain as the 
debug (`{:#?}`) of the plans. Shouldn't we reserve the debug format for our 
internal representation, and introduce a new format to `EXPLAIN` (not 
necessarily in this PR)?
   
   I am torn on this and have no strong opinion. I could go either way -- I 
think that most of the information that is typically in explain plans is useful 
for developers, and thus there would always be substantial overlap between a 
Debug format and an explain format. But maybe the differences between them 
could justify the duplication? I am not sure. 
   
   > #### Other formats?
   > One thing I dislike about spark is that EXPLAIN is in a not-so-nice format 
to parse programmatically. One idea is to allow `explain` to return the result 
in json. This is e.g. useful to construct a visual graph representation of the 
plan. One idea would be to support another string parameter, `format`, and make 
it default to `string` (not necessarily in this PR).
   
   I like this idea. The typical thing I have used in the past for such 
purposes is https://graphviz.org/ and `dot` -- I have filed 
https://issues.apache.org/jira/browse/ARROW-9746 to track the improvement. 
   
   > #### Support on `table` API?
   > We should consider adding `.explain` on the `table.rs`.
   
   Good idea -- I defer to @andygrove on how he sees the DataFusion CLI 
evolving -- aka if he wants to mirror the SQLite style `.table` or `.quit` type 
commands or the current `quit` command or something else entirely.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to