-- Sending as an email in case Jira messages are filtered out. Please let me know your thoughts on this. Thanks!
Jira ticket: https://issues.apache.org/jira/browse/PARQUET-2210 Currently, we do not use the statistics that is stored in the page headers for pruning the rows that we read. Row group pruning is very coarse-grained and in many cases does not prune the row group. I propose adding a FilteredPageReader that would accept a filter and would not return the pages that do not match the filter based on page statistics. Initial set of filters can be: EQUALS, IS NULL, IS NOT NULL. Also, the FilteredPageReader will keep track of what row ranges matched and not matched. We could use this to skip reading rows that do not match from the rest of the columns. Note that the SkipRecords API is being added to the Parquet reader (https://issues.apache.org/jira/browse/PARQUET-2188)