Re: [PR] Implement tree explain for DataSourceExec [datafusion]

via GitHub Wed, 05 Mar 2025 07:03:32 -0800


alamb commented on code in PR #15029:
URL: https://github.com/apache/datafusion/pull/15029#discussion_r1981578280



##########
datafusion/datasource-csv/src/source.rs:
##########
@@ -617,8 +617,13 @@ impl FileSource for CsvSource {
     fn file_type(&self) -> &str {
         "csv"
     }
-    fn fmt_extra(&self, _t: DisplayFormatType, f: &mut fmt::Formatter) -> 
fmt::Result {
-        write!(f, ", has_header={}", self.has_header)
+    fn fmt_extra(&self, t: DisplayFormatType, f: &mut fmt::Formatter) -> 
fmt::Result {
+        match t {
+            DisplayFormatType::Default | DisplayFormatType::Verbose => {
+                write!(f, ", has_header={}", self.has_header)
+            }
+            DisplayFormatType::TreeRender => Ok(()),

Review Comment:
   Per the description of TreeRender:
   
https://github.com/apache/datafusion/blob/3dc212c9078c92f57ab7f58e75e1258130c772d0/datafusion/physical-plan/src/display.rs#L48-L74
   
   TreeRender mode should have only the most relevant details for understanding 
the high level plan
   



##########
datafusion/datasource/src/memory.rs:
##########
@@ -425,25 +425,20 @@ impl DataSource for MemorySourceConfig {
                 }
             }
             DisplayFormatType::TreeRender => {
-                let partition_sizes: Vec<_> =
-                    self.partitions.iter().map(|b| b.len()).collect();
-                writeln!(f, "partition_sizes={:?}", partition_sizes)?;
-
-                if let Some(output_ordering) = self.sort_information.first() {
-                    writeln!(f, "output_ordering={}", output_ordering)?;
-                }
-
-                let eq_properties = self.eq_properties();
-                let constraints = eq_properties.constraints();
-                if !constraints.is_empty() {
-                    writeln!(f, "constraints={}", constraints)?;
-                }
-
-                if let Some(limit) = self.fetch {
-                    writeln!(f, "fetch={}", limit)?;
-                }
-
-                write!(f, "partitions={}", partition_sizes.len())
+                let total_rows = self.partitions.iter().map(|b| 
b.len()).sum::<usize>();

Review Comment:
   Likewise, the previous version is too verbose I think



##########
datafusion/sqllogictest/test_files/explain_tree.slt:
##########
@@ -213,7 +252,156 @@ physical_plan
 12)└─────────────┬─────────────┘
 13)┌─────────────┴─────────────┐
 14)│       DataSourceExec      │
-15)└───────────────────────────┘
+15)│    --------------------   │
+16)│          files: 1         │
+17)│        format: csv        │
+18)└───────────────────────────┘
+
+# Query with filter on csv
+query TT
+explain SELECT int_col FROM table2 WHERE string_col != 'foo';
+----
+logical_plan
+01)Projection: table2.int_col
+02)--Filter: table2.string_col != Utf8View("foo")
+03)----TableScan: table2 projection=[int_col, string_col], 
partial_filters=[table2.string_col != Utf8View("foo")]
+physical_plan
+01)┌───────────────────────────┐
+02)│    CoalesceBatchesExec    │
+03)└─────────────┬─────────────┘
+04)┌─────────────┴─────────────┐
+05)│         FilterExec        │
+06)│    --------------------   │
+07)│         predicate:        │
+08)│    string_col@1 != foo    │
+09)└─────────────┬─────────────┘
+10)┌─────────────┴─────────────┐
+11)│      RepartitionExec      │
+12)└─────────────┬─────────────┘
+13)┌─────────────┴─────────────┐
+14)│       DataSourceExec      │
+15)│    --------------------   │
+16)│          files: 1         │
+17)│      format: parquet      │
+18)│                           │
+19)│         predicate:        │
+20)│    string_col@1 != foo    │
+21)└───────────────────────────┘
+
+
+# Query with filter on parquet
+query TT
+explain SELECT int_col FROM table2 WHERE string_col != 'foo';
+----
+logical_plan
+01)Projection: table2.int_col
+02)--Filter: table2.string_col != Utf8View("foo")
+03)----TableScan: table2 projection=[int_col, string_col], 
partial_filters=[table2.string_col != Utf8View("foo")]
+physical_plan
+01)┌───────────────────────────┐
+02)│    CoalesceBatchesExec    │
+03)└─────────────┬─────────────┘
+04)┌─────────────┴─────────────┐
+05)│         FilterExec        │
+06)│    --------------------   │
+07)│         predicate:        │
+08)│    string_col@1 != foo    │
+09)└─────────────┬─────────────┘
+10)┌─────────────┴─────────────┐
+11)│      RepartitionExec      │
+12)└─────────────┬─────────────┘
+13)┌─────────────┴─────────────┐
+14)│       DataSourceExec      │
+15)│    --------------------   │
+16)│          files: 1         │
+17)│      format: parquet      │
+18)│                           │

Review Comment:
   I don't know why there is an extra newline here 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Implement tree explain for DataSourceExec [datafusion]

Reply via email to