alamb commented on code in PR #6187: URL: https://github.com/apache/arrow-rs/pull/6187#discussion_r1709943310
########## parquet/src/arrow/arrow_reader/statistics.rs: ########## @@ -1052,10 +1046,7 @@ fn max_statistics<'a, I: Iterator<Item = Option<&'a ParquetStatistics>>>( /// Extracts the min statistics from an iterator /// of parquet page [`Index`]'es to an [`ArrayRef`] -pub(crate) fn min_page_statistics<'a, I>( - data_type: Option<&DataType>, - iterator: I, -) -> Result<ArrayRef> +pub(crate) fn min_page_statistics<'a, I>(data_type: &DataType, iterator: I) -> Result<ArrayRef> Review Comment: I think you are right that `&[ParquetStatistics]` is likely to be the most commonly used iterator, and this generic formulation may be simply over engineering I harbor some idea that we will be able to use the iterator API to quickly extract statistics for many files at once into a single large array (e.g. by reading the data using `ParquetMetaDataReader`) so we can prune across 100s of files very fast. However, I will fully admit that no such code exists yet (either in our InfluxDB code or as an example) 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
