Re: [PR] Make it clear that `StatisticsConverter` can not panic [arrow-rs]

via GitHub Thu, 08 Aug 2024 10:10:38 -0700


alamb commented on code in PR #6187:
URL: https://github.com/apache/arrow-rs/pull/6187#discussion_r1709943310



##########
parquet/src/arrow/arrow_reader/statistics.rs:
##########
@@ -1052,10 +1046,7 @@ fn max_statistics<'a, I: Iterator<Item = Option<&'a 
ParquetStatistics>>>(
 
 /// Extracts the min statistics from an iterator
 /// of parquet page [`Index`]'es to an [`ArrayRef`]
-pub(crate) fn min_page_statistics<'a, I>(
-    data_type: Option<&DataType>,
-    iterator: I,
-) -> Result<ArrayRef>
+pub(crate) fn min_page_statistics<'a, I>(data_type: &DataType, iterator: I) -> 
Result<ArrayRef>

Review Comment:
   I think you are right that  `&[ParquetStatistics]` is likely to be the most 
commonly used iterator, and this generic formulation may be simply over 
engineering
   
   I harbor some idea that we will be able to use the iterator API to quickly 
extract statistics for many files at once into a single large array (e.g. by 
reading the data using `ParquetMetaDataReader`) so we can prune across 100s of 
files very fast.
   
   However, I will fully admit that no such code exists yet (either in our 
InfluxDB code or as an example) 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Make it clear that `StatisticsConverter` can not panic [arrow-rs]

Reply via email to