[
https://issues.apache.org/jira/browse/IMPALA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-2108.
-----------------------------------
Assignee: (was: Bharath Vissapragada)
Resolution: Duplicate
> Improve partition pruning by extracting partition-column filters from
> non-trivial disjunctions.
> -----------------------------------------------------------------------------------------------
>
> Key: IMPALA-2108
> URL: https://issues.apache.org/jira/browse/IMPALA-2108
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala
> 2.2
> Reporter: Alexander Behm
> Priority: Minor
> Labels: newbie, performance
>
> *Problem Statement*
> Impala fails to prune partitions if the partition-column filters are part of
> a "non-trivial" disjunction where each disjunct itself consists of conjuncts
> referencing both partition and non-partition columns.
> Consider the following example:
> {code}
> create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
> [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1
> and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
> Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and
> c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial
> djsiunctions
> +----------------------------------------------------------------------------------------+
> | Explain String
> |
> +----------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=176.00MB VCores=1
> |
> | WARNING: The following tables are missing relevant table and/or column
> statistics. |
> | default.test_table
> |
> |
> |
> | 01:EXCHANGE [UNPARTITIONED]
> |
> | |
> |
> | 00:SCAN HDFS [default.test_table]
> |
> | partitions=5/5 files=9 size=36B
> |
> | predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3
> AND c2 = 'c') |
> +----------------------------------------------------------------------------------------+
> Fetched 9 row(s) in 0.04s
> [localhost.localdomain:21000] >
> {code}
> *Cause*
> This is a limitation in how Impala filters partitions.
> *Workaround*
> The above example can be fixed by manually rewriting the predicate as follows:
> {code}
> select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or
> (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
> {code}
> *Proposed fix*
> The proposed fix is for Impala to automatically do what is stated in the
> workaround above:
> Extract the partition-column filters from the disjunctions, create a new
> predicate with all those partition-column filters connected with OR, and add
> the new predicate to the original one with AND.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)