alamb opened a new issue, #16402: URL: https://github.com/apache/datafusion/issues/16402
### Is your feature request related to a problem or challenge? - This is a follow on to the feature added by @adriangb in https://github.com/apache/datafusion/pull/16014 @adriangb added the great feature that can prune entire files while opening many parquet files The current statistics for `DataSourceExec` have information on how many row groups were pruned, it would also be great to add statistics on how many **FILES** were pruned by this new code For example, with clickbench Q24 here is an excerpt from the file ```sql EXPLAIN ANALYZE SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY "SearchPhrase" LIMIT 10; ``` ``` | | DataSourceExec:... pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=325, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0 ### Describe the solution you'd like I would like some new statistics that record: * `files_pruned`: total files that were pruned by filters during open It is important to make sure the docs explain the metric only describes files pruned after the plan starts (not files that are pruned during planning) ### Describe alternatives you've considered 1. Add a field to `ParquetFileMetrics`: https://github.com/apache/datafusion/blob/6d5e00ad3f8e53f7252cb1d3c72a6c7f28c1aed6/datafusion/datasource-parquet/src/metrics.rs#L29-L28 2. Thread that through to the opener in `datafusion/datasource-parquet/src/opener.rs` so when files are pruned we can see that in the metrics ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org