[GitHub] [arrow-datafusion] alamb commented on a change in pull request #965: Move CBOs and Statistics to physical plan

GitBox Sun, 12 Sep 2021 04:34:47 -0700


alamb commented on a change in pull request #965:
URL: https://github.com/apache/arrow-datafusion/pull/965#discussion_r706823335




##########
File path: datafusion/src/physical_plan/mod.rs
##########
@@ -89,6 +89,34 @@ impl Stream for EmptyRecordBatchStream {
 /// Physical planner interface
 pub use self::planner::PhysicalPlanner;
 
+/// Statistics for an physical plan node
+/// Fields are optional and can be inexact because the sources
+/// sometimes provide approximate estimates for performance reasons
+/// and the transformations output are not always predictable.
+#[derive(Debug, Clone, Default, PartialEq)]
+pub struct Statistics {
+    /// The number of table rows
+    pub num_rows: Option<usize>,
+    /// total byte of the table rows
+    pub total_byte_size: Option<usize>,
+    /// Statistics on a column level
+    pub column_statistics: Option<Vec<ColumnStatistics>>,
+    /// Some datasources or transformations might provide inexact estimates
+    pub is_exact: bool,
+}
+/// This table statistics are estimates about column
+#[derive(Clone, Debug, Default, PartialEq)]
+pub struct ColumnStatistics {

Review comment:
       I agree -- I think the approach in this PR is a good step forward and we 
can make it more sophisticated over time. 
   
   FWIW I think in IOx (where we have a custom `TableProvider` and 
`ExecutionPlan`) we do what @yjshen  is suggesting and we store statistics in 
our catalog, and translate them to the DataFusion format `Statistics` when 
required (no data scanning required)
   
   
https://github.com/influxdata/influxdb_iox/blob/f42f0349ed435c431c5e60855eb8d95fd6e6b646/query/src/provider.rs#L270




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #965: Move CBOs and Statistics to physical plan

Reply via email to