fatemah created PARQUET-2210:
--------------------------------
Summary: Add FilteredPageReader to filter rows based on page
statistics
Key: PARQUET-2210
URL: https://issues.apache.org/jira/browse/PARQUET-2210
Project: Parquet
Issue Type: New Feature
Reporter: fatemah
Currently, we do not use the statistics that is stored in the page headers for
pruning the rows that we read. Row group pruning is very coarse-grained and in
many cases does not prune the row group. I propose adding a FilteredPageReader
that would accept a filter and would not return the pages that do not match the
filter based on page statistics.
Initial set of filters can be: EQUALS, IS NULL, IS NOT NULL.
Also, the FilteredPageReader will keep track of what row ranges matched and not
matched. We could use this to skip reading rows that do not match from the rest
of the columns. Note that the SkipRecords API was recently added to the Parquet
reader (https://issues.apache.org/jira/browse/PARQUET-2188)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)