Paul Rogers created IMPALA-7996:
-----------------------------------
Summary: Very inefficient plan for pathological ON FALSE clause
Key: IMPALA-7996
URL: https://issues.apache.org/jira/browse/IMPALA-7996
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Bugs in the rewrite code prevented the ON clause from being subject to the full
set of rewrite rules. See IMPALA-XXXX.
Once that bug was fixed, it uncovered an inefficient plan for the pathological
case of {{ON FALSE}}.
{code:sql}
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
right outer join functional.alltypestiny b
on (a.id = b.id and !true)
{code}
Note the ON clause and its actual meaning:
{noformat}
a.id = b.id and !true --> a.id = b.id AND FALSE --> FALSE
{noformat}
(Note that a test should be added for the straight-up FALSE condition.)
Once rewrites are applied applied, (or, presumably, if we have a test for {{ON
FALSE}}), we get a new plan, which is actually worse:
{noformat}
PLAN-ROOT SINK
|
02:NESTED LOOP JOIN [RIGHT OUTER JOIN]
| join predicates: FALSE
|
|--01:SCAN HDFS [functional.alltypestiny b]
| partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
partitions=4/4 files=4 size=6.32KB
{noformat}
This says to loop over both tables, which is a very bad plan. The correct plan
is to reduce the query to an empty result set.
Similar case:
{code:sql}
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
left outer join functional.alltypestiny b
on (a.id = b.id and 1 + 1 > 10)
{code}
Reasoning:
{noformat}
a.id = b.id and 1 + 1 > 10 --> a.id = b.id AND 2 > 10
--> a.id = b.id AND FALSE --> FALSE
{noformat}
The plan after performing above rewrites, which is also an unnecessary nested
loop join:
{noformat}
PLAN-ROOT SINK
|
02:NESTED LOOP JOIN [LEFT OUTER JOIN]
| join predicates: FALSE
|
|--01:SCAN HDFS [functional.alltypestiny b]
| partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
partitions=4/4 files=4 size=6.32KB
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)