[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #68: Experimenting with arrow2

GitBox Mon, 06 Sep 2021 22:19:27 -0700


jorgecarleitao commented on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-913998272



   @houqp , the argument `groups_filter: Arc<dyn Fn(usize, &RowGroupMetaData) 
-> bool>` of the `Reader` is used to filter row groups; the first argument is 
the row group, the second is its metadata, which includes all statistics.
   
   In more recent versions we also have `Arc<dyn Fn(&PageHeader) -> bool>` to 
skip individual pages based on their statistics.
   
   I am not familiar with this code-base, but I would expect that we create 
such mapping out of expressions: every row group is selected or not based on 
its metadata with statistics, and potentially its number in the parquet file 
for partitioned reads of a single file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #68: Experimenting with arrow2

Reply via email to