alamb opened a new issue, #9961: URL: https://github.com/apache/arrow-datafusion/issues/9961
### Is your feature request related to a problem or challenge? @appletreeisyellow added `PruningStatistics::row_counts()` in https://github.com/apache/arrow-datafusion/pull/9223 which allows better pruning of columns which are all null. However, I believe we have not hooked that API up into the `ParquetExec`, so it won't prune row groups based on this information. For example, if column `a` is all NULL, a predicate `a > 5' can never be true, but the the ParquetExec won't be able to prune row groups or pages for this case ### Describe the solution you'd like Implement `RowGroupPruningStastics::row_counts` https://github.com/apache/arrow-datafusion/blob/2dad90425bacb98a3c2a4214faad53850c93104e/datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs#L345-L347 And `PagesPruningStatistics::row_counts` https://github.com/apache/arrow-datafusion/blob/2dad90425bacb98a3c2a4214faad53850c93104e/datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs#L550-L552 ### Describe alternatives you've considered I think the row counts can be found on https://docs.rs/parquet/latest/parquet/format/struct.ColumnMetaData.html So this ticket should be a matter of copying the row counts correctly and writing some tests in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/parquet/row_group_pruning.rs / https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/parquet/page_pruning.rs ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
