[
https://issues.apache.org/jira/browse/PARQUET-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yash Datta updated PARQUET-128:
-------------------------------
Summary: Optimize the parquet RecordReader implementation when: A.
filterpredicate is pushed down , B. filterpredicate is pushed down on a flat
schema (was: Optimize the parquet RecordReader implementation when A.
filterpredicate is pushed down , B. filterpredicate is pushed down on a flat
schema )
> Optimize the parquet RecordReader implementation when: A. filterpredicate is
> pushed down , B. filterpredicate is pushed down on a flat schema
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PARQUET-128
> URL: https://issues.apache.org/jira/browse/PARQUET-128
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.6.0rc2
> Reporter: Yash Datta
> Fix For: parquet-mr_1.6.0
>
>
> The RecordReader implementation currently will read all the columns before
> applying the filter predicate and deciding whether to keep the row or discard
> it.
> We can have a RecordReader which will only assemble the columns on which
> filters are applied (which are usually a few), then apply the filter and
> decide whether to keep the row or not , and then goes on to assemble the
> remaining columns or skip the remaining columns accordingly.
> The performance improvement by this change is seen to be significant , and is
> better in case smaller number of rows are returned by filtering (which is
> usually the case) and there are many number of columns
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)