[ 
https://issues.apache.org/jira/browse/IMPALA-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679132#comment-16679132
 ] 

Paul Rogers commented on IMPALA-7754:
-------------------------------------

There be dragons here. It seems that, without rewrites, we get slightly wrong 
results in certain pathological queries. Consider this case in 
{{test-empty.sql}}:

{noformat}
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
left outer join functional.alltypestiny b
on (a.id = b.id and 1 + 1 > 10)
---- PLAN
Max Per-Host Resource Reservation: Memory=1.95MB Threads=3
Per-Host Resource Estimates: Memory=66MB
PLAN-ROOT SINK
|
02:HASH JOIN [LEFT OUTER JOIN]
|  hash predicates: a.id = b.id
|  other join predicates: FALSE
|
|--01:SCAN HDFS [functional.alltypestiny b]
|     partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
   partitions=4/4 files=4 size=6.32KB
{noformat}

If we re-analyze expressions after rewrite, we get the following, which seems 
more accurate:

{noformat}
Max Per-Host Resource Reservation: Memory=16.00KB Threads=3
Per-Host Resource Estimates: Memory=64MB
PLAN-ROOT SINK
|
02:NESTED LOOP JOIN [LEFT OUTER JOIN]
|  join predicates: (FALSE)
{noformat}

Note that, in the current expected results (the top example), the predicate 
{{a.id = b.id}} is pushed down, ignoring the fact that it is part of a 
predicate which reduces to {{FALSE}}. In the second version, the planner has 
correctly identified that the join produces no rows. Also, the memory cost is 
lower for the second example, Hence, the current behavior (first example) 
appears wrong.


> Expressions sometimes not re-analyzed after rewrite
> ---------------------------------------------------
>
>                 Key: IMPALA-7754
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7754
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> The analyzer has a chain of rules which fire in order without (as noted 
> above) repeats. The result of rule A (rewriting conditional functions) is fed 
> into rule B (simplify CASE). Each rule requires that analysis be done so that 
> attributes of expressions can be picked out.
> As it turns out, in the current code, this is rather ad-hoc. The 
> {{SimplifyConditionalsRule}} re-analyzes its result as part of the fix for 
> IMPALA-5125, but others do not, leading to optimizations not working. In 
> particular, in a chain of rewrites for {{IS DISTINCT FROM}}, certain rules 
> didn't fire because previous rules left new expressions in an un-analyzed 
> state. This is a bug.
> The fix is to analyze the result any time a rule fires, before passing the 
> result to the next rule.
> {code:java}
>   private Expr applyRuleBottomUp(Expr expr, ExprRewriteRule rule, Analyzer 
> analyzer)
>       throws AnalysisException {
>     ...
>     Expr rewrittenExpr = rule.apply(expr, analyzer);
>     if (rewrittenExpr != expr) {
>       ++numChanges_;
>       rewrittenExpr.analyze(analyzer); // Add me!
>     }}
>     return rewrittenExpr;
>   }
> {code}
> There are several places that the above logic appears: make the change in all 
> of them.
> Then, in rules that simply refused to run if an expression is to analyzed:
> {code:java}
> public class SimplifyDistinctFromRule implements ExprRewriteRule {
>   public Expr apply(Expr expr, Analyzer analyzer) {
>     if (!expr.isAnalyzed()) return expr;
> {code}
> Replace this with an assertion that analysis must have been done:
> {code:java}
> public class SimplifyDistinctFromRule implements ExprRewriteRule {
>   public Expr apply(Expr expr, Analyzer analyzer) {
>     assert expr.isAnalyzed();
> {code}
> To be safe, the assertion fires only in "debug" mode, not in production.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to