Paul Rogers created IMPALA-7996:
-----------------------------------

             Summary: Very inefficient plan for pathological ON FALSE clause
                 Key: IMPALA-7996
                 URL: https://issues.apache.org/jira/browse/IMPALA-7996
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers


Bugs in the rewrite code prevented the ON clause from being subject to the full 
set of rewrite rules. See IMPALA-XXXX.

Once that bug was fixed, it uncovered an inefficient plan for the pathological 
case of {{ON FALSE}}.

{code:sql}
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
right outer join functional.alltypestiny b
on (a.id = b.id and !true)
{code}

Note the ON clause and its actual meaning:

{noformat}
a.id = b.id and !true --> a.id = b.id AND FALSE --> FALSE
{noformat}

(Note that a test should be added for the straight-up FALSE condition.)

Once rewrites are applied applied, (or, presumably, if we have a test for {{ON 
FALSE}}), we get a new plan, which is actually worse:

{noformat}
PLAN-ROOT SINK
|
02:NESTED LOOP JOIN [RIGHT OUTER JOIN]
|  join predicates: FALSE
|
|--01:SCAN HDFS [functional.alltypestiny b]
|     partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
   partitions=4/4 files=4 size=6.32KB
{noformat}

This says to loop over both tables, which is a very bad plan. The correct plan 
is to reduce the query to an empty result set.

Similar case:

{code:sql}
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
left outer join functional.alltypestiny b
on (a.id = b.id and 1 + 1 > 10)
{code}

Reasoning:

{noformat}
a.id = b.id and 1 + 1 > 10 --> a.id = b.id AND 2 > 10
   --> a.id = b.id AND FALSE --> FALSE
{noformat}

The plan after performing above rewrites, which is also an unnecessary nested 
loop join:

{noformat}
PLAN-ROOT SINK
|
02:NESTED LOOP JOIN [LEFT OUTER JOIN]
|  join predicates: FALSE
|
|--01:SCAN HDFS [functional.alltypestiny b]
|     partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
   partitions=4/4 files=4 size=6.32KB
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to