[
https://issues.apache.org/jira/browse/IMPALA-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287831#comment-17287831
]
Aman Sinha commented on IMPALA-9356:
------------------------------------
I checked on Postgres 9.6, the plan shows the non-deterministic function
predicate is only evaluated at the Full Outer Join, not pushed to any scan:
{noformat}
explain select * from (select a3, upper(b3) from t3) as dt1 full outer join
(select a4, upper(b4) from t4) as dt2 on a3 = a4 where random() = 0.5;
Merge Full Join (cost=161.29..301.82 rows=34 width=72)
Merge Cond: (t3.a3 = t4.a4)
Filter: (random() = '0.5'::double precision) <---
predicate
-> Sort (cost=80.64..83.54 rows=1160 width=42)
Sort Key: t3.a3
-> Seq Scan on t3 (cost=0.00..21.60 rows=1160 width=42)
-> Sort (cost=80.64..83.54 rows=1160 width=42)
Sort Key: t4.a4
-> Seq Scan on t4 (cost=0.00..21.60 rows=1160 width=42)
{noformat}
Hive also does not push to the scans for the FOJ case:
{noformat}
| CBO PLAN: |
| HiveAggregate(group=[{}], agg#0=[count()]) |
| HiveFilter(condition=[=(rand(), 0.5)]) |
| HiveJoin(condition=[=($0, $1)], joinType=[full]) |
| HiveProject(id=[$0]) |
| HiveTableScan(table=[[functional, alltypessmall]], table:alias=[t1]) |
| HiveProject(id=[$0]) |
| HiveTableScan(table=[[functional, alltypestiny]], table:alias=[t2]) |
{noformat}
> The predicates that the tuple ids involved are empty migrate to outer-joined
> inline view or real table
> ------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-9356
> URL: https://issues.apache.org/jira/browse/IMPALA-9356
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.3.0
> Reporter: Xianqing He
> Assignee: Xianqing He
> Priority: Minor
> Labels: correctness
>
> {code}
> SELECT COUNT(*)
> FROM (
> SELECT id, upper(string_col) AS upper_val
> FROM functional.alltypestiny
> ) a
> FULL JOIN (
> SELECT id, upper(string_col) AS upper_val
> FROM functional.alltypestiny
> ) b
> ON a.id = b.id
> WHERE rand() = 12
> {code}
> The Plan
> {noformat}
> +------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=6 |
> | Per-Host Resource Estimates: Memory=86MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 07:AGGREGATE [FINALIZE] |
> | | output: count:merge(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 06:EXCHANGE [UNPARTITIONED] |
> | | |
> | 03:AGGREGATE |
> | | output: count(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED] |
> | | hash predicates: id = id |
> | | row-size=8B cardinality=9 |
> | | |
> | |--05:EXCHANGE [HASH(id)] |
> | | | |
> | | 00:SCAN HDFS [functional.alltypestiny] |
> | | HDFS partitions=4/4 files=4 size=460B |
> | | predicates: rand() = 12 |
> | | row-size=4B cardinality=1 |
> | | |
> | 04:EXCHANGE [HASH(id)] |
> | | |
> | 01:SCAN HDFS [functional.alltypestiny] |
> | HDFS partitions=4/4 files=4 size=460B |
> | row-size=4B cardinality=8 |
> +------------------------------------------------------------+
> {noformat}
> The rand() returns a random value between 0 and 1 so "rand() = 12" will
> always be false. All rows should be rejected by the WHERE clause. If "rand()
> = 12" is evaluated in only one side, the other side can still produce rows.
> So the outer join will still have results.
> We can't migrate the predicate that the tuple ids involved are empty to
> outer-joined inline view, Also for real tables have this question.
> {code}
> explain select 1 from functional.alltypestiny t1 full join
> functional.alltypestiny t2 on t1.id = t2.id where rand() = 2
> {code}
> {noformat}
> +------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=6 |
> | Per-Host Resource Estimates: Memory=66MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 05:EXCHANGE [UNPARTITIONED] |
> | | |
> | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED] |
> | | hash predicates: t2.id = t1.id |
> | | row-size=8B cardinality=9 |
> | | |
> | |--04:EXCHANGE [HASH(t1.id)] |
> | | | |
> | | 00:SCAN HDFS [functional.alltypestiny t1] |
> | | HDFS partitions=4/4 files=4 size=460B |
> | | predicates: rand() = 2 |
> | | row-size=4B cardinality=1 |
> | | |
> | 03:EXCHANGE [HASH(t2.id)] |
> | | |
> | 01:SCAN HDFS [functional.alltypestiny t2] |
> | HDFS partitions=4/4 files=4 size=460B |
> | row-size=4B cardinality=8 |
> +------------------------------------------------------------+
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]