alamb commented on pull request #7959:
URL: https://github.com/apache/arrow/pull/7959#issuecomment-674208715
Thanks @jorgecarleitao
> #### Physical Plan?
> Looking at CreateExternalTable, we do not construct a physical plan out of
it. Wouldn't it be simpler if we do not create a physical plan, like we do for
CreateExternalTable, since we do not need to multi-thread it nor run through
partitions nor batches?
Yes, you are right that we could special case explain not to actually run a
physical plan. I was thinking there are some benefits to using a physical plan,
as follows:
1. Immediate support for all other input/output channels (not just the CLI)
-- so for example if there is ever a client library for DataFusion, it will not
need any additional logic to handle `EXPLAIN`
2. Use / store the plans themselves as data ( imagine `INSERT INTO foo
EXPLAIN select * from bar`)/
I admit these are both forward looking features with likely no practical
import in the near term. However, I felt making `Explain` a first class part of
both logical and physical plans would reduce special cases and make the code
"cleaner" (though this is just my opinion)
> #### Plans as {:#?}
> If I read correctly, we currently return the result of the explain as the
debug (`{:#?}`) of the plans. Shouldn't we reserve the debug format for our
internal representation, and introduce a new format to `EXPLAIN` (not
necessarily in this PR)?
I am torn on this and have no strong opinion. I could go either way -- I
think that most of the information that is typically in explain plans is useful
for developers, and thus there would always be substantial overlap between a
Debug format and an explain format. But maybe the differences between them
could justify the duplication? I am not sure.
> #### Other formats?
> One thing I dislike about spark is that EXPLAIN is in a not-so-nice format
to parse programmatically. One idea is to allow `explain` to return the result
in json. This is e.g. useful to construct a visual graph representation of the
plan. One idea would be to support another string parameter, `format`, and make
it default to `string` (not necessarily in this PR).
I like this idea. The typical thing I have used in the past for such
purposes is https://graphviz.org/ and `dot` -- I have filed
https://issues.apache.org/jira/browse/ARROW-9746 to track the improvement.
> #### Support on `table` API?
> We should consider adding `.explain` on the `table.rs`.
Good idea -- I defer to @andygrove on how he sees the DataFusion CLI
evolving -- aka if he wants to mirror the SQLite style `.table` or `.quit` type
commands or the current `quit` command or something else entirely.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]