alamb opened a new issue, #11885: URL: https://github.com/apache/datafusion/issues/11885
### Is your feature request related to a problem or challenge? We are trying to improve the speed of DataFusion when running the ClickBench partitioned test (which has 100 files) -- this means the per-file overhead is important to redudce One structure that has non trivial overhead is the `Statistics` structure (as it has a `ScalarValue` for each column of each file so there are 100 * (number columns) * 2 at least `ScalarValues` ### Describe the solution you'd like It would be great to reduce the overhead of passing around these values. ### Describe alternatives you've considered One way to do so is to avoid copying them when the underlying `ParquetExec` is copied by using an `Option<Arc<Statistics>>` here: https://github.com/apache/datafusion/blob/9503456388544788e1a881a0a80a3c61ac015a86/datafusion/core/src/datasource/listing/mod.rs#L81-L80 ### Additional context Interestingly @Rachelint https://github.com/apache/datafusion/pull/11802#issuecomment-2271924370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
