[ 
https://issues.apache.org/jira/browse/IMPALA-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661102#comment-16661102
 ] 

Paul Rogers commented on IMPALA-7747:
-------------------------------------

It can be hard to see the generated code. Here are the steps required to do so.

The logic to enable codegen is a bit baroque with many factors considered.

* No code generation is ever done for expressions in the SELECT clause (if they 
land in the root fragment.) (IMPALA-4356).
* No code generation for small cardinality tables.
* Team members suggested using {{set DISABLE_CODEGEN_ROWS_THRESHOLD=0;}}, but 
this had no effect when working with a small table.
* When working with a table with no stats, the planner refuses to estimate 
cardinality, and the BE appears to assume it is large. So, one can cause CG to 
occur with a small table if that table has no stats.

To see the generated code, launch Impala as:

{noformat}
start-impala-cluster.py -s 1 --impalad_args -dump_ir
{noformat}

Then create two simple tables:

{code:sql}
create database test;
use test;

create table bools (b boolean);
insert into bools values (true), (false), (null);

create table ints (n int);
insert into ints values (null, 0, 1, 2);
{code}

Note: *do not* compute stats!

CG won't occur for the {{SELECT}} clause in the following:

{code:sql}
select case when b = true then "foo" else "bar" end from bools;
{code}

The workaround is to move the expression to the {{WHERE}} clause, then CG 
occurs and the code can be dumped to the log:

{code:sql}
select b FROM bools WHERE (case when b = true then "foo" else "bar" end) = 
"foo";
{code}

> Review and modernize conditional function rewrites
> --------------------------------------------------
>
>                 Key: IMPALA-7747
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7747
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> IMPALA-7655 asks to revisit the rewrite rules for several conditional 
> functions. [~philip] suggested that the rewrite rules should apply to [all of 
> them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
>  To keep IMPALA-7655 focused, the larger review is presented here, along with 
> suggested  opportunities to modernize the front-end rewrite rules.
> This is the top-level task for the review tasks, each change is identified by 
> a sub-task or linked task in order to keep each code review task small.
> h4. Overview
> The full set of conditional functions include:
> {noformat}
> if(boolean condition, type ifTrue, type ifFalseOrNull)
> ifnull(type a, type ifNull)
> isfalse(boolean)
> isnotfalse(boolean)
> isnottrue(boolean)
> isnull(type a, type ifNull)
> istrue(boolean)
> nonnullvalue(expression)
> nullif(expr1,expr2)
> nullifzero(numeric_expr)
> nullvalue(expression)
> nvl(type a, type ifNull)
> nvl2(type a, type ifNull, type ifNotNull)
> zeroifnull(numeric_expr)
> {noformat}
> Turns out conditionals are complex as substantial prior work has gone into 
> optimizations. The FE has a number of transforms that affect specific 
> conditional statements. The BE has additional transforms. To proceed, each 
> operation must be tracked through the system one by one.
> The discussion below summarizes the state of each of the Impala conditional 
> functions to identify the path needed to implement the requested changes, and 
> to ensure that the changes don't impact other functionality. We also point 
> out a few out-of-scope nice-to-haves as we go along.
> In general, all the action here is in just a few places:
> * {{sql-parser.cup}} in which syntax is reduced to parse nodes such as 
> functions or operators. The parser unifies certain constructs such as {{<=>}} 
> and {{IS NOT DISTINCT FROM}}.
> * {{FunctionCallExpr.createExpr()}} is given a function-like definition and 
> converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}. 
> A nice-to-have would be to move this logic to 
> {{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing 
> transforms.
> * {{SimplifyConditionalsRule}} does a great many transforms of various 
> conditional rules. (We will add more for this task.)
> * {{impala_functions.py}} in the BE provides a mapping from remaining 
> functions (those not optimized away above) to implementations. All functions 
> listed here are cross-compiled into LLVM along with a generated wrapper 
> function that binds the function to its set of arguments.
> * {{conditional-functions.[h|cc]}} handles special case functions that 
> require short-circuit argument evaluation ({{isull()}}, {{if()}}, 
> {{coalesce()}}). These three functions are never code generated. The goal of 
> this task is to convert these into a code generated for using {{CASE}}.
> For all expressions, the planner does a check for all-constant expressions 
> (such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with 
> the result of the expression by using the BE to interpret the partial 
> constant-only expression tree. As a result, the rewrite steps focus on the 
> non-trivial cases that require knowledge of the semantics of a given function.
> In the suggestions that follow, we rewrite certain functions into {{CASE}}. 
> But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks 
> to resolve that issue.
> Below is a summary of each conditional function that identifies current state 
> and any changes that might be possible.
> h4. {{CASE ...}}
> BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated 
> when in the {{WHERE}} clause or in a join.
> h4. {{x IS [NOT] (TRUE | FALSE)}}
> FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent 
> {{ISTRUE\(x)}}, etc. function.
> h4. {{x IS [NOT] NULL}}
> FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is 
> the opposite of {{IS TRUE}}, etc.)
> BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.
> h4. {{IS[NOT](TRUE|FALSE)\(x)}}
>  
> BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.
> h4. {{NULLIF(expr1, expr2)}}
> FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} &rarr; {{if(expr1 IS 
> DISTINCT FROM expr2, expr1, NULL)}}
> {{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no 
> entry for {{nullify()}} in {{impala_functions.py}}.
> Note that the implementation here is different from the 
> [docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]
>  which suggests that the rewrite uses equality. Both for normal data and 
> nulls. However, the implementation actually will handle the NaN case for 
> floats once IMPALA-6661 is fixed:
> {code:sql}
> 10 * NULLIF(x, sqrt(-1))
> {code}
> The above will produce a {{NULL}} if {{x}} is {{NaN}}, {{10 * x}} otherwise. 
> This is a hidden bonus of the current implementation.
> *Suggestion:*
> * Move the rewrite rules from {{FunctionCallExpr.createExpr()}} into 
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it 
> appears in {{_impala_builtins}}.
> * Add two simplification rules:
>   * {{nullif(NULL, x)}} &rarr; {{NULL}}
>   * {{nullif(x, NULL)}} &rarr; {{NULL}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr1 IS DISTINCT FROM expr2 THEN expr1 END
> {code}
> h4.  {{NVL2(expr, ifNotNull, ifNull)}}
> FE, {{FunctionCallExpr}}: Rewritten to {{if(expr IS NOT NULL, ifNotNull, 
> ifNull)}}. {{nvl2()}} vanishes from the plan at this point and does not 
> appear in {{impala_functions.py}}.
> *Suggestion:*
> * Move rewrite from {{FunctionCallExpr.createExpr()}} into 
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it 
> appears in {{_impala_builtins}}.
> * Add two simplifications:
>  * {{nvl2(null, a, b)}} &rarr; {{b}}
>   * {{nvl2(non-null-listeral, a, b)}} &rarr; {{a}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr IS NOT NULL THEN ifNotNull ELSE ifNull END
> {code}
> As it turns out {{decode()}} is a rather special beast because it needs to 
> declare n^2 versions for the full set of types. For this reason, we can't add 
> it to {{impala_functions.py}} and thus can't move the rewrite rules. We'll 
> leave it as the lone remaining rewrite in {{FunctionCallExpr.createExpr()}}.
> h4. {{ISNULL(a, b)}}
> BE: Alias for this method exist in {{impala_functions.py}}, special 
> implementation in {{conditional-functions.[h|cc]}}.
> *Suggestion:* Rewrite as:
> {code:sql}
> CASE a IS NULL THEN b ELSE a END
> {code}
> Since {{isnull()}} would vanish from the plan after this transform, remove 
> the BE implementation.
> h4. {{NVL(a, b)}} \\  {{IFNULL(a, b)}}
> FE, {{SimplifyConditional}}: Treated same as {{ISNULL(a, b)}}, but is not 
> rewritten to this form.
> BE: Alias for this method exist in {{impala_functions.py}}.
> *Suggestion:* Rewrite to {{ISNULL(a, b)}}, drop from {{impala_functions.py}} 
> to make things a bit more tidy. (If the suggestion for {{isnull()}} is taken, 
> then even {{isnull()}} vanishes from the plan in the planner.
> h4. {{[NON]NULLVALUE\(x)}}
> An entry in {{impala_functions.py}} maps this method to the compiled {{IS 
> [NOT] NULL}} operator implementations.
> *Suggestion:* To make {{impala_functions.py}} less messy, add a transform to 
> the FE to replace these functions with the operators, and remove the 
> functions' entries from {{impala_functions.py}}. This also ensures that all 
> optimization applied to the operators is also done for the functions.
> h4. {{x <=> (TRUE | FALSE | NULL)}} \\ {{x IS [NOT] DISTINCT FROM (TRUE | 
> FALSE | NULL)}}
> FE {{sql-parser.cup}}: Parsed (in generic form) into a 
> {{BinaryPredicate(BinaryPredicate.Operator.(NOT_DISTINCT|DISTINCT_FROM)...)}}
> BE: Implemented code generated 
> {{Operators::NotDistinct_BooleanVal_BooleanVal}}.
> *Suggestion:* To leverage special Boolean optimizations, rewrite the above to 
> {{IS(TRUE|FALSE)\(x)}} or {{x IS [NOT] NULL}} in the planner. (The planner 
> appears to already rewrite expressions such as {{TRUE <=> x}} into a 
> canonical form so that the rewrite rules need not handle both versions.)
> Note: there is no function equivalent of these functions, they are 
> "invisible" to the user, but are listed as {{distinctfrom}} and 
> {{notdistinct}} in {{impala_functions.py}}.
> h4. {{IF(cond, trueExpr, falseExpr)}}
> FE: {{SimplifyConditional}} performs basic simplifications.
> BE: Implemented in  {{conditional-functions.[h|cc]}} as an interpreted-only 
> function to allow short-circuit argument evaluation.
> *Suggestion:* Rewrite in the FE to
> {code:sql}
> CASE WHEN cond THEN trueExpr ELSE falseExpr END
> {code}
> {{IF()}} will then vanish from the plan so remove the BE implementation.
> h4. {{COALESCE(e1, e2, … en)}}
> FE: {{SimplifyConditional}} performs basic simplifications.
> BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only 
> function to allow short-circuit argument evaluation.
> *Suggestion:* Rewrite in the FE to
> {noformat}
> CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END
> {noformat}
> When doing so, extend two existing optimizations.
> 1. Remove not only leading null values, but all null values.
> 2. Special case not just the last non-null literal, but rather when 
> encountering the first such value, drop all remaining terms.
> {{COLAESCE()}} will then vanish from the plan so remove the BE 
> implementation. Since this step will remove the last of the special 
> conditional functions, remove {{conditional-functions.[h|cc]}} as well.
> h4. {{DECODE(expr, search1, result1 [, search2, result2 ...] [, default] )}}
> FE: {{FunctionCallExpr}}, {{CaseExpr}}: Rewrites {{decode()}} to {{CASE}}. 
> {{decode()}} vanishes from the plan after this step.
> See the header of {{CaseExpr.java}} for details. Looks like the 
> implementation was done before {{IS DISTINCT}} was available:
> {quote}
> Example of equivalent {{CASE}} for {{DECODE(foo, 'bar', 1, col, 2, NULL, 3, 
> 4)}}:
> {code:sql}
> CASE 
>     WHEN foo = 'bar' THEN 1   -- no need for IS NULL check
>     WHEN foo IS NULL AND col IS NULL OR foo = col THEN 2
>     WHEN foo IS NULL THEN 3  -- no need for equality check
>     ELSE 4
> END
> {code}
> {quote}
> *Nice-to-have:* In FE, modify to use {{<=>}} (AKA {{IS NOT DISTINCT}}):
> {code:sql}
> CASE [WHEN expr <=> searchi THEN resulti]+ [ELSE default]? END
> {code}
> Example:
> {code:sql}
> CASE
>     WHEN foo <=> 'bar' THEN 1
>     WHEN foo <=> col THEN 2
>     WHEN foo <=> NULL THEN 3
>     ELSE 4
> END
> {code}
> This expansion (and the original one) evaluates the decode expression 
> multiple times and would benefit from the optimization mentioned earlier.
> Note also that {{decode()}} can be used to pick out floating-point NaN values:
> {code:sql}
> decode(float_col, sqrt(-1), 0, float_col)
> {code}
> Here, {{sqrt(-1)}} is used to create a NaN value because Impala has no 
> {{NaN}} constant or function.
> *Suggestion:* The current implementation is rather ad-hoc, probably because 
> of the unusual nature of the types of the arguments to {{decode()}}. Would be 
> cleaner to do the rewrite as rewrite rule rather than as an ad-hoc step when 
> creating an expression. That is, rather than doing the rewrite in 
> {{FunctionCallExpr.createExpr()}}, do it in 
> {{SimplifyConditionalsRule.apply()}}.
> Doing this would allow us to add an entry for {[decode()}} in the 
> builtin-functions table. To handle the odd arguments, create a one-off 
> {{ScalarFunction}} subclass do to the specialized argument matching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to