adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2753068116
Noting a future optimization opportunity to be done after this work: push down pre-collected stats into ParquetSource / DataSourceExec so that dynamic filters can use them to prune without having to open the file at all. This is only beneficial if stats were collected during the planning phase (eg by ListingTableProvider or a secondary index) but did not result in pruning the file (because there was not an appropriate filter at the time) but later in a dynamically generated filter _can_ prune based on those stats, so we avoid reading or re-reading the Parquet metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org