alamb opened a new pull request, #10813: URL: https://github.com/apache/datafusion/pull/10813
## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/9929 ## Rationale for this change Many query engines / use cases have some sort of a specialized index for data stored in parquet. This index can be used to determine which row groups / selections within a file are needed However, the DataFusion `ParquetExec` has no way for users to pass this information in. Instead it tries to prune row groups based on the min/max statistics and other information in the file's metadata. This PR makes it possible for users to pass in a `ParquetAccessPlan` to `ParquetExec` with a starting plan, which is then further pruned based on the file's metadata. ## What changes are included in this PR? 1. Allow users to pass in a `ParquetAccessPlan` for each `PartitionedFile` read by `ParquetExec` 2. Add error checking to `ParquetAccessPlan` now that it can be specified by users 2. Document how this works 3. Add tests for this new API ## Are these changes tested? Yes, new tests are added <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org