[ 
https://issues.apache.org/jira/browse/IMPALA-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7952:
--------------------------------
    Description: 
The FE has a "normalize binary predicates" rule that puts slots on the left 
hand side:

{noformat}
1 = id --> id = 1
{noformat}

Presumably this is useful. As the planner proceeds, it creates additional 
binary predicates, but tends to create them in the non-normalized form.

Examples:

* {{Expr.trySubstitute()}}
* {{StmtRewriter.createJoinConjunct()}}
* {{SingleNodePlanner.getNormalizedEqPred()}}
* {{StmtRewriter.rewriteWhereClauseSubqueries()}}

Once rewrite rules are integrated into analysis, we end up with a conflict: 
should expressions created internally be exempt from some or all of the rewrite 
rules? Even from mandatory rules, such as this one?

The solution is to allow such expressions to be rewritten to normalized form as 
part of the new integrate analyze-and-rewrite logic.

Note that the {{trySubstitute()}} case needs more attention. Presumably the 
expressions put into the "smap" are analyzed, hence rewritten. If not, then 
there are probably other subtle bugs lurking in that code.

Fixing this bug caused plans to change in {{PlannerTest.testJoins()}}. These 
changes suggest that one part of the analyzer works to create the "<slot> <op> 
<expr>" pattern, while other parts strive for the opposite, creating 
instability. Requires more research.

{code:sql}
# test that on-clause predicates referring to multiple tuple ids
# get registered as eq join conjuncts
select t1.*
from (select * from functional.alltypestiny) t1
  join (select * from functional.alltypestiny) t2 on (t1.id = t2.id)
  join functional.alltypestiny t3 on (coalesce(t1.id, t2.id) = t3.id)
{code}

Plan before the fix:

{noformat}
PLAN-ROOT SINK
|
04:HASH JOIN [INNER JOIN]
|  hash predicates: coalesce(functional.alltypestiny.id, 
functional.alltypestiny.id) = t3.id
|  runtime filters: RF000 <- t3.id
|
|--02:SCAN HDFS [functional.alltypestiny t3]
|     partitions=4/4 files=4 size=460B
|
03:HASH JOIN [INNER JOIN]
|  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
|  runtime filters: RF002 <- functional.alltypestiny.id
|
|--01:SCAN HDFS [functional.alltypestiny]
|     partitions=4/4 files=4 size=460B
|     runtime filters: RF000 -> coalesce(functional.alltypestiny.id, 
functional.alltypestiny.id)
|
00:SCAN HDFS [functional.alltypestiny]
   partitions=4/4 files=4 size=460B
   runtime filters: RF000 -> coalesce(functional.alltypestiny.id, 
functional.alltypestiny.id), RF002 -> functional.alltypestiny.id
{noformat}

Plan after the fix, with the filter pushed further down the plan:

{noformat}
PLAN-ROOT SINK
|
04:HASH JOIN [INNER JOIN]
|  hash predicates: t3.id = coalesce(functional.alltypestiny.id, 
functional.alltypestiny.id)
|
|--02:SCAN HDFS [functional.alltypestiny t3]
|     partitions=4/4 files=4 size=460B
|
03:HASH JOIN [INNER JOIN]
|  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
|  runtime filters: RF002 <- functional.alltypestiny.id
|
|--01:SCAN HDFS [functional.alltypestiny]
|     partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypestiny]
   partitions=4/4 files=4 size=460B
   runtime filters: RF002 -> functional.alltypestiny.id
{noformat}


  was:
The FE has a "normalize binary predicates" rule that puts slots on the left 
hand side:

{noformat}
1 = id --> id = 1
{noformat}

Presumably this is useful. As the planner proceeds, it creates additional 
binary predicates, but tends to create them in the non-normalized form.

Examples:

* {{Expr.trySubstitute()}}
* {{StmtRewriter.createJoinConjunct()}}
* {{SingleNodePlanner.getNormalizedEqPred()}}

Once rewrite rules are integrated into analysis, we end up with a conflict: 
should expressions created internally be exempt from some or all of the rewrite 
rules? Even from mandatory rules, such as this one?

The solution is to allow such expressions to be rewritten to normalized form as 
part of the new integrate analyze-and-rewrite logic.

Note that the {{trySubstitute()}} case needs more attention. Presumably the 
expressions put into the "smap" are analyzed, hence rewritten. If not, then 
there are probably other subtle bugs lurking in that code.


> Planner creates non-normalized binary predicates
> ------------------------------------------------
>
>                 Key: IMPALA-7952
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7952
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> The FE has a "normalize binary predicates" rule that puts slots on the left 
> hand side:
> {noformat}
> 1 = id --> id = 1
> {noformat}
> Presumably this is useful. As the planner proceeds, it creates additional 
> binary predicates, but tends to create them in the non-normalized form.
> Examples:
> * {{Expr.trySubstitute()}}
> * {{StmtRewriter.createJoinConjunct()}}
> * {{SingleNodePlanner.getNormalizedEqPred()}}
> * {{StmtRewriter.rewriteWhereClauseSubqueries()}}
> Once rewrite rules are integrated into analysis, we end up with a conflict: 
> should expressions created internally be exempt from some or all of the 
> rewrite rules? Even from mandatory rules, such as this one?
> The solution is to allow such expressions to be rewritten to normalized form 
> as part of the new integrate analyze-and-rewrite logic.
> Note that the {{trySubstitute()}} case needs more attention. Presumably the 
> expressions put into the "smap" are analyzed, hence rewritten. If not, then 
> there are probably other subtle bugs lurking in that code.
> Fixing this bug caused plans to change in {{PlannerTest.testJoins()}}. These 
> changes suggest that one part of the analyzer works to create the "<slot> 
> <op> <expr>" pattern, while other parts strive for the opposite, creating 
> instability. Requires more research.
> {code:sql}
> # test that on-clause predicates referring to multiple tuple ids
> # get registered as eq join conjuncts
> select t1.*
> from (select * from functional.alltypestiny) t1
>   join (select * from functional.alltypestiny) t2 on (t1.id = t2.id)
>   join functional.alltypestiny t3 on (coalesce(t1.id, t2.id) = t3.id)
> {code}
> Plan before the fix:
> {noformat}
> PLAN-ROOT SINK
> |
> 04:HASH JOIN [INNER JOIN]
> |  hash predicates: coalesce(functional.alltypestiny.id, 
> functional.alltypestiny.id) = t3.id
> |  runtime filters: RF000 <- t3.id
> |
> |--02:SCAN HDFS [functional.alltypestiny t3]
> |     partitions=4/4 files=4 size=460B
> |
> 03:HASH JOIN [INNER JOIN]
> |  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
> |  runtime filters: RF002 <- functional.alltypestiny.id
> |
> |--01:SCAN HDFS [functional.alltypestiny]
> |     partitions=4/4 files=4 size=460B
> |     runtime filters: RF000 -> coalesce(functional.alltypestiny.id, 
> functional.alltypestiny.id)
> |
> 00:SCAN HDFS [functional.alltypestiny]
>    partitions=4/4 files=4 size=460B
>    runtime filters: RF000 -> coalesce(functional.alltypestiny.id, 
> functional.alltypestiny.id), RF002 -> functional.alltypestiny.id
> {noformat}
> Plan after the fix, with the filter pushed further down the plan:
> {noformat}
> PLAN-ROOT SINK
> |
> 04:HASH JOIN [INNER JOIN]
> |  hash predicates: t3.id = coalesce(functional.alltypestiny.id, 
> functional.alltypestiny.id)
> |
> |--02:SCAN HDFS [functional.alltypestiny t3]
> |     partitions=4/4 files=4 size=460B
> |
> 03:HASH JOIN [INNER JOIN]
> |  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
> |  runtime filters: RF002 <- functional.alltypestiny.id
> |
> |--01:SCAN HDFS [functional.alltypestiny]
> |     partitions=4/4 files=4 size=460B
> |
> 00:SCAN HDFS [functional.alltypestiny]
>    partitions=4/4 files=4 size=460B
>    runtime filters: RF002 -> functional.alltypestiny.id
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to