[ 
https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011045#comment-17011045
 ] 

Aman Sinha commented on IMPALA-9281:
------------------------------------

[~fangyurao] so it looks like it is unrelated to type of underlying table. 
Couple of places to check are: (a) in the Analyzer.createEquivConjuncts() 
method, see if the new inferred predicate (i.e b.table_source = 'ONE') was 
created or not, (b) if it was created, then check 
SingleNodePlanner.migrateConjunctsToInlineView() to see whether it was migrated 
into the view.

This should be a straightforward predicate inference, but there may be 
something to do with the fact the 'table_source' is not a column in the base 
tables but is only present in the views. For example, if you change the join 
condition to the base table column 'c2',  does it do the right thing ?  i.e run 
the following query: 
{noformat}
select * 
from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
where a.c2 = b.c2 
and a.c2 = 'one'; {noformat}

> Inferred predicates not assigned to scan nodes when views are involved
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9281
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9281
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>         Attachments: profile_query_1_kudu.txt, profile_query_1_parquet.txt, 
> profile_query_2_kudu.txt, profile_query_2_parquet.txt
>
>
> When a query involves the join of views each created based on multiple 
> tables, the inferred predicate(s) is(are) not assigned to the scan node(s). 
> This issue is/seems related to 
> https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon using views 
> created based on Kudu tables is given. However, we note that this issue is 
> not Kudu specific, it also exists when the underlying tables are of Parquet 
> format. The respective query profiles are also attached.
> To create the (Kudu) tables in the provided example below, please replace the 
> address(es) of the Kudu master(s) accordingly.
> {code:java}
> CREATE TABLE default.t1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t1 values (1, 'one');
> CREATE TABLE default.t2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t2 values (2, 'two');
> CREATE TABLE default.ta1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta1 values (1,'one');
> CREATE TABLE default.ta2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta2 values (2,'two');
> CREATE VIEW myview_1_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.t1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;
> CREATE VIEW myview_2_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
> {code}
> For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}}, 
> and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given 
> as follows.
> Contents of table {{t1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{t2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents of table {{ta1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{ta2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents in {{default.myview_1_on_2_tables}} (union of tables {{t1}} and 
> {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> Contents in {{default.myview_2_on_2_tables}} (union of tables {{ta1}} and 
> {{ta2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> After creating the related tables and views described above, we consider the 
> following 2 queries.
> Query 1:
> {code:java}
> select * 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Query 2:
> {code:java}
> select * 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and b.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out 
> those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the 
> same result set as the following.
> {code:java}
> +--------------+----+-----+--------------+----+-----+
> | table_source | c1 | c2  | table_source | c1 | c2  |
> +--------------+----+-----+--------------+----+-----+
> | ONE          | 1  | one | ONE          | 1  | one |
> +--------------+----+-----+--------------+----+-----+
> {code}
> However, according to the query profile, Query 1 results in 3 Kudu scans on 
> tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2 
> that incorporates the additional/redundant predicate "{{b.table_source = 
> 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}}, 
> respectively due to this seemingly redundant predicate on {{b.table_source}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal 
> since a table that cannot contain any row in the result set is still scanned, 
> i.e., table {{ta2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to