rdettai commented on a change in pull request #965:
URL: https://github.com/apache/arrow-datafusion/pull/965#discussion_r706000623
##########
File path: datafusion/src/physical_plan/parquet.rs
##########
@@ -390,6 +387,10 @@ impl ExecutionPlan for ParquetExec {
fn metrics(&self) -> Option<MetricsSet> {
Some(self.metrics.clone_inner())
}
+
+ fn statistics(&self) -> Statistics {
+ self.statistics.clone()
Review comment:
For this specific implementation yes, but this is a trait method, and
other implementations do create new instances of `Statistics` on the fly, which
aren't bound to `&self`'s lifetime. To make this trait method return a
reference we need either:
- all `ExecutionPlan` instances to be immutable and always compute the
statistics upon instantiation (also related to the [comment
](https://github.com/apache/arrow-datafusion/pull/965#discussion_r705957698)
from @houqp )
- to make the method `&mut self` allowing it to bind newly constructed
objects to `self`'s lifetime (in common people's terms: allow caching 😉)
I have a personal preference for the first solution. I would open a separate
PR to address it as it would require to fully review the immutability of all
`ExecutionPlan` implems
As a side note, this would also apply for the other trait methods:
- `schema()` should return &SchemaRef,
- `children()` should return &[Arc<dyn ExecutionPlan>] or &[&Arc<dyn
ExecutionPlan>]
- `output_partitioning()` should return &Partitioning
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]