jorgecarleitao commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-913998272
@houqp , the argument `groups_filter: Arc<dyn Fn(usize, &RowGroupMetaData) -> bool>` of the `Reader` is used to filter row groups; the first argument is the row group, the second is its metadata, which includes all statistics. In more recent versions we also have `Arc<dyn Fn(&PageHeader) -> bool>` to skip individual pages based on their statistics. I am not familiar with this code-base, but I would expect that we create such mapping out of expressions: every row group is selected or not based on its metadata with statistics, and potentially its number in the parquet file for partitioned reads of a single file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
