alamb opened a new pull request, #10738: URL: https://github.com/apache/datafusion/pull/10738
Builds on https://github.com/apache/datafusion/pull/10727 ## Which issue does this PR close? This is part of https://github.com/apache/datafusion/issues/9929, a way to provide row selections from an outside index to the parquet reader ## Rationale for this change My highlevel plan, which you can see in action in https://github.com/apache/datafusion/pull/10701 is that the `ParquetAccessPlan` is the structure that users pass to the parquet reader to select row groups and pages. The idea is that a user can provide a starting `ParquetAccessPlan` which can be then further refined by the parquet reader by reading file metadata, if needed. ## What changes are included in this PR? 1. Split out the representation of which RowGroups to scan from the code to prune them (`RowGroupBuilder`) 2. Update the page pruning to update ParquetAccessPlan 3. Copious documentation ## Are these changes tested? Covered by existing tests ## Are there any user-facing changes? Not yet -- this is all internal. I will make a follow on PR with a proposed API for passing this structure in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org