alamb opened a new pull request #8619:
URL: https://github.com/apache/arrow/pull/8619


   # Rationale:
   I have been tracking down potential issues DataFusion for my work project, 
and I have found myself wanting to print out the state of the logical_plan 
several times. The existing debug formatting is ok, but it was missing a few 
key items:
   
   1. Schema information (as in when did columns appear / disappear in the plan)
   2. A visual representation (graphviz)
   
   
   # Open questions:
   1. Would it be better to split the visitor into `visitor.rs` and display 
code into `display.rs`? I am torn -- this is all logically part of 
logical_plan, but the module is getting kind of big.
   
   # Changes:
   
   This PR adds several additional formatting options to logical plans in 
addition to the existing indent. Examples are included below
   
   To do so it also provides a generalized "Visitor" pattern for walking 
logical plan nodes, as well as a general pattern to display logical plan nodes 
with multiple potential formats.
   
   Note it should be straight forward to get this wired up into EXPALIN as 
well: https://issues.apache.org/jira/browse/ARROW-9746
   
   
   ## Existing Formatting
   Here is what master currently allows:
   
   ```
   Projection: #id
      Filter: #state Eq Utf8(\"CO\")\
          CsvScan: employee.csv projection=Some([0, 3])
   ```
   
   ## With Schema Information.
   This PR adds a dump with schema information:
   
   ```
    Projection: #id [id:Int32]\
       Filter: #state Eq Utf8(\"CO\") [id:Int32, state:Utf8]\
         TableScan: employee.csv projection=Some([0, 3]) [id:Int32, 
state:Utf8]";
   ```
   
   ## As Graphviz
   
   This PR adds the ability to display plans using 
[Graphviz](http://www.graphviz.org)
   
   Here is an example GraphViz plan that comes out:
   ```
   // Begin DataFusion GraphViz Plan (see https://graphviz.org)
   digraph {
     subgraph cluster_1
     {
       graph[label="LogicalPlan"]
       2[label="Projection: #id"]
       3[label="Filter: #state Eq Utf8(_CO_)"]
       2 -> 3 [arrowhead=none, arrowtail=normal, dir=back]
       4[label="TableScan: employee.csv projection=Some([0, 3])"]
       3 -> 4 [arrowhead=none, arrowtail=normal, dir=back]
     }
     subgraph cluster_5
     {
       graph[label="Detailed LogicalPlan"]
       6[label="Projection: #id\nSchema: [id:Int32]"]
       7[label="Filter: #state Eq Utf8(_CO_)\nSchema: [id:Int32, state:Utf8]"]
       6 -> 7 [arrowhead=none, arrowtail=normal, dir=back]
       8[label="TableScan: employee.csv projection=Some([0, 3])\nSchema: 
[id:Int32, state:Utf8]"]
       7 -> 8 [arrowhead=none, arrowtail=normal, dir=back]
     }
   }
   // End DataFusion GraphViz Plan
   ```
   
   Here is what that looks like rendered:
   <img width="1679" alt="Screen Shot 2020-11-09 at 2 30 07 PM" 
src="https://user-images.githubusercontent.com/490673/98606322-0f891880-22b5-11eb-8e1c-669ce85f0f52.png";>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to