alamb opened a new pull request, #10738:
URL: https://github.com/apache/datafusion/pull/10738

   Builds on https://github.com/apache/datafusion/pull/10727
   
   ## Which issue does this PR close?
   
   This is part of https://github.com/apache/datafusion/issues/9929, a way to 
provide row selections from an outside index to the parquet reader
   
   
   ## Rationale for this change
   
   My highlevel plan, which you can see in action in 
https://github.com/apache/datafusion/pull/10701 is that the `ParquetAccessPlan` 
is the structure that users pass to the parquet reader to select row groups and 
pages.
   
   The idea is that a user can provide a starting `ParquetAccessPlan` which can 
be then further refined by the
   parquet reader by reading file metadata, if needed.
   
   ## What changes are included in this PR?
   1. Split out the representation of which RowGroups to scan from the code to 
prune them (`RowGroupBuilder`)
   2. Update the page pruning to update ParquetAccessPlan
   3. Copious documentation
   
   ## Are these changes tested?
   Covered by existing  tests
   
   ## Are there any user-facing changes?
   Not yet -- this is all internal. I will make a follow on PR with a proposed 
API for passing this structure in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to