[
https://issues.apache.org/jira/browse/IMPALA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith resolved IMPALA-3841.
-----------------------------------
Fix Version/s: Impala 5.0.0
Resolution: Fixed
> Avoid materializing nested collections if top-level predicates already
> disqualify the row.
> ------------------------------------------------------------------------------------------
>
> Key: IMPALA-3841
> URL: https://issues.apache.org/jira/browse/IMPALA-3841
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.5.0, Impala 2.6.0
> Reporter: Alexander Behm
> Assignee: Xuebin Su
> Priority: Critical
> Labels: complextype, nested_types, parquet, performance
> Fix For: Impala 5.0.0
>
>
> Today, we fully materialize a row before evaluating the top-level conjuncts
> when scanning Parquet. This includes materializing nested collections. We
> should avoid materializing nested collections if top-level conjuncts already
> discard the row. Our recent move to column-wise materialization makes this
> improvement feasible (IMPALA-2736).
> To illustrate the problem, consider this query:
> {code}
> select * from customer c, c.orders o where c.id = 10
> {code}
> Even though we have a very selective predicate on the top-level customer, our
> scanner will still fully materialize all orders of all customers. The
> non-matches will be filtered, but we still pay the cost of materializing the
> orders.
> The proposed improvement is to avoid materializing the orders of
> non-qualifying customers.
> The improvement will several things:
> * Analyze and separate the top-level conjuncts into those that can be
> evaluated before materializing the nested collections and those that require
> nested collections to be materialized. In particular, we need to be careful
> with our auto-generated !empty() predicates on nested collections.
> * Add a new SkipValues() or similar interface to the Parquet column readers
> to advances the scanner without actually materializing values. If possible,
> we should skip entire blocks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)