rdettai commented on a change in pull request #965:
URL: https://github.com/apache/arrow-datafusion/pull/965#discussion_r706000623



##########
File path: datafusion/src/physical_plan/parquet.rs
##########
@@ -390,6 +387,10 @@ impl ExecutionPlan for ParquetExec {
     fn metrics(&self) -> Option<MetricsSet> {
         Some(self.metrics.clone_inner())
     }
+
+    fn statistics(&self) -> Statistics {
+        self.statistics.clone()

Review comment:
       For this specific implementation yes, but this is a trait method, and 
other implementations do create new instances of `Statistics` on the fly, which 
aren't bound to `&self`'s lifetime. To make this trait method return a 
reference we need either:
   - all `ExecutionPlan` instances to be immutable and always compute the 
statistics upon instantiation (also related to the [comment 
](https://github.com/apache/arrow-datafusion/pull/965#discussion_r705957698) 
from @houqp )
   - to make the method `&mut self` allowing it to bind newly constructed 
objects to `self`'s lifetime (in common people's terms: allow caching 😉)
   
   I have a personal preference for the first solution. I would open a separate 
PR to address it as it would require to fully review the immutability of all 
`ExecutionPlan` implems
   
   As a side note, this would also apply for the other trait methods: 
   - `schema()` should return &SchemaRef, 
   - `children()` should return &[Arc<dyn ExecutionPlan>] or &[&Arc<dyn 
ExecutionPlan>]
   - `output_partitioning()` should return &Partitioning




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to