alamb commented on code in PR #12032: URL: https://github.com/apache/datafusion/pull/12032#discussion_r1721656073
########## datafusion/core/src/datasource/physical_plan/parquet/row_group_filter.rs: ########## @@ -868,7 +917,7 @@ mod tests { &pruning_predicate, &metrics, ); - assert_pruned(row_groups, ExpectedPruning::Some(vec![0, 1, 3])); + assert_pruned(row_groups, ExpectedPruning::Some(vec![0, 1])); Review Comment: Note this is different and an improvement What is happening is that previously if *either* min or max was unknown `has_statistics_set()` would return false and thus neither min or max was reported (basically `Statistics` could not distinguish between having only min or max set. https://github.com/apache/arrow-rs/blob/27789d7c9abb50796a4042e7e193703efe3c95b3/parquet/src/file/statistics.rs#L635-L637 After https://github.com/apache/arrow-rs/pull/6216 `Statistics` can distinguish between having only one field set and so row group index 3 can be pruned (as its max is known to be 2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org