alamb opened a new issue, #10626: URL: https://github.com/apache/datafusion/issues/10626
### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10453 @Lordworms added a benchmark for extracting statistics from parquet files in https://github.com/apache/datafusion/pull/10610 As this code can be used to extract statistics from parquet files, we would like to make sure it is efficient (especially if we are going to extract statistics for many files at once) The idea here is to improve the speed of the statistics extraction ### Describe the solution you'd like Make this go faster ```shell cargo bench --bench parquet_statistic ``` ### Describe alternatives you've considered I did some brief profiling:  I think they key would be to change these loops so they built the required Arrow Arrays directly from primitive values rather than from `ScalarValue`: https://github.com/apache/datafusion/blob/1bf7112171fd820c101e325822dc4d44dd65b2ff/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L183-L189 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org