[ https://issues.apache.org/jira/browse/IMPALA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058894#comment-17058894 ]
Tim Armstrong commented on IMPALA-3841: --------------------------------------- There's also opportunity to optimise this for any kind of column - i.e. evaluate predicates before materialising other columns. > Avoid materializing nested collections if top-level predicates already > disqualify the row. > ------------------------------------------------------------------------------------------ > > Key: IMPALA-3841 > URL: https://issues.apache.org/jira/browse/IMPALA-3841 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 2.5.0, Impala 2.6.0 > Reporter: Alexander Behm > Priority: Minor > Labels: complextype, nested_types, parquet, performance > > Today, we fully materialize a row before evaluating the top-level conjuncts > when scanning Parquet. This includes materializing nested collections. We > should avoid materializing nested collections if top-level conjuncts already > discard the row. Our recent move to column-wise materialization makes this > improvement feasible (IMPALA-2736). > To illustrate the problem, consider this query: > {code} > select * from customer c, c.orders o where c.id = 10 > {code} > Even though we have a very selective predicate on the top-level customer, our > scanner will still fully materialize all orders of all customers. The > non-matches will be filtered, but we still pay the cost of materializing the > orders. > The proposed improvement is to avoid materializing the orders of > non-qualifying customers. > The improvement will several things: > * Analyze and separate the top-level conjuncts into those that can be > evaluated before materializing the nested collections and those that require > nested collections to be materialized. In particular, we need to be careful > with our auto-generated !empty() predicates on nested collections. > * Add a new SkipValues() or similar interface to the Parquet column readers > to advances the scanner without actually materializing values. If possible, > we should skip entire blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org