tshauck commented on code in PR #10972: URL: https://github.com/apache/datafusion/pull/10972#discussion_r1644954474
########## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ########## @@ -552,6 +552,22 @@ make_data_page_stats_iterator!(MinInt32DataPageStatsIterator, min, Index::INT32, make_data_page_stats_iterator!(MaxInt32DataPageStatsIterator, max, Index::INT32, i32); make_data_page_stats_iterator!(MinInt64DataPageStatsIterator, min, Index::INT64, i64); make_data_page_stats_iterator!(MaxInt64DataPageStatsIterator, max, Index::INT64, i64); +make_data_page_stats_iterator!( Review Comment: Oooff, sorry @tmi, I should've done `take` on the issue. Taking @alamb's pointer to look at row groups, I might be wrong, but I think there's a couple of differences between it and the data page stats that complicate things... * `max_bytes`/`min_bytes`, which is used for row groups, does not exist on `PageIndex` in the parquet crate but does on `Statistics`, so using it in the macro isn't possible at this point. * Using `[u8]` as the `stat_value_type` is problematic when combined with the iterator macro in that `type Item = Vec<Option<$stat_value_type>>;` can't work with `[u8]` because it's unsized. Changing the item def to `type Item = Vec<Option<&'a $stat_value_type>>;` works and is similar to the item definition of row group (`type Item = Option<&'a $stat_value_type>;`). f16 is less common, so maybe to have a solution that's similar to Statistics and hopefully avoids cloning, we could: 1. Update this PR to do f32 and f64, get it in to support higher priority types and is mostly innocuous 2. Update the parquet crate to support `max_bytes`/`min_bytes` on `PageIndex` 3. Update the data page stats to use a `&'a $stat_value_type` and add f16 when that's merged @tmi, maybe you and I could collaborate on 2 & 3 if that seems reasonable? Otherwise, certainly happy to go another route, if there's a better approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org