[
https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011088#comment-17011088
]
Fang-Yu Rao commented on IMPALA-9281:
-------------------------------------
Thanks [~amansinha]!
After executing the suggested query above, we can see that the predicate of
{{b.c2 = 'one'}} is correctly inferred and pushed down to the scan node. Please
refer to the corresponding query profiles
{{profile_query_1_predicate_on_table_column_parquet.txt}} and
{{profile_query_1_predicate_on_table_column_kudu.txt}} attached.
> Inferred predicates not assigned to scan nodes when views are involved
> ----------------------------------------------------------------------
>
> Key: IMPALA-9281
> URL: https://issues.apache.org/jira/browse/IMPALA-9281
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.4.0
> Reporter: Fang-Yu Rao
> Assignee: Fang-Yu Rao
> Priority: Major
> Attachments: profile_query_1_kudu.txt, profile_query_1_parquet.txt,
> profile_query_1_predicate_on_table_column_kudu.txt,
> profile_query_1_predicate_on_table_column_parquet.txt,
> profile_query_2_kudu.txt, profile_query_2_parquet.txt
>
>
> When a query involves the join of views each created based on multiple
> tables, the inferred predicate(s) is(are) not assigned to the scan node(s).
> This issue is/seems related to
> https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon using views
> created based on Kudu tables is given. However, we note that this issue is
> not Kudu specific, it also exists when the underlying tables are of Parquet
> format. The respective query profiles are also attached.
> To create the (Kudu) tables in the provided example below, please replace the
> address(es) of the Kudu master(s) accordingly.
> {code:java}
> CREATE TABLE default.t1 (
> c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t1 values (1, 'one');
> CREATE TABLE default.t2 (
> c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t2 values (2, 'two');
> CREATE TABLE default.ta1 (
> c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta1 values (1,'one');
> CREATE TABLE default.ta2 (
> c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta2 values (2,'two');
> CREATE VIEW myview_1_on_2_tables AS
> SELECT 'ONE' table_source, c1, c2 FROM `default`.t1
> UNION ALL
> SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;
> CREATE VIEW myview_2_on_2_tables AS
> SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1
> UNION ALL
> SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
> {code}
> For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}},
> and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given
> as follows.
> Contents of table {{t1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2 |
> +----+-----+
> | 1 | one |
> +----+-----+
> {code}
> Contents of table {{t2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2 |
> +----+-----+
> | 2 | two |
> +----+-----+
> {code}
> Contents of table {{ta1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2 |
> +----+-----+
> | 1 | one |
> +----+-----+
> {code}
> Contents of table {{ta2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2 |
> +----+-----+
> | 2 | two |
> +----+-----+
> {code}
> Contents in {{default.myview_1_on_2_tables}} (union of tables {{t1}} and
> {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2 |
> +--------------+----+-----+
> | ONE | 1 | one |
> | TWO | 2 | two |
> +--------------+----+-----+
> {code}
> Contents in {{default.myview_2_on_2_tables}} (union of tables {{ta1}} and
> {{ta2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2 |
> +--------------+----+-----+
> | ONE | 1 | one |
> | TWO | 2 | two |
> +--------------+----+-----+
> {code}
> After creating the related tables and views described above, we consider the
> following 2 queries.
> Query 1:
> {code:java}
> select *
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b
> where a.table_source = 'ONE'
> and a.table_source = b.table_source
> and a.c2 = 'one';
> {code}
> Query 2:
> {code:java}
> select *
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b
> where a.table_source = 'ONE'
> and b.table_source = 'ONE'
> and a.table_source = b.table_source
> and a.c2 = 'one';
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out
> those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the
> same result set as the following.
> {code:java}
> +--------------+----+-----+--------------+----+-----+
> | table_source | c1 | c2 | table_source | c1 | c2 |
> +--------------+----+-----+--------------+----+-----+
> | ONE | 1 | one | ONE | 1 | one |
> +--------------+----+-----+--------------+----+-----+
> {code}
> However, according to the query profile, Query 1 results in 3 Kudu scans on
> tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2
> that incorporates the additional/redundant predicate "{{b.table_source =
> 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}},
> respectively due to this seemingly redundant predicate on {{b.table_source}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal
> since a table that cannot contain any row in the result set is still scanned,
> i.e., table {{ta2}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]