[
https://issues.apache.org/jira/browse/IMPALA-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661109#comment-16661109
]
Paul Rogers commented on IMPALA-7747:
-------------------------------------
To summarize the proposed changes:
IMPALA-7655:
* Rewrite {{isnull()}} to use {{CASE}}. (Done)
* Rewrite {{coalesce()}} to use {{CASE}}. (Code done. See
* Rewrite {{if()}} to use {{CASE}}.
Additional changes:
* Rewrite {{ifnull()}} and {{nvl()}} to {{isnull()}}. (Done)
* Modify the {{decode()}} rewrite to use {{IS NOT DISTINCT}}. (Done.)
* Implement special handling for {{decode()}} in the builtins table, then move
code for rewrite out of {{CaseExpr}} and into {{SimplifyConditionalRules}}.
(Done.)
* Address IMPALA-7741 as this is needed to allow removing BE implementations of
functions while keeping the functions in the FE function registry. (Done.)
* Change handling of {{nvl2()}} and {{nullif()}} to perform the rewriting in
{{SimplifyConditionalRules}} rather than when creating the function expression.
And the simplifications suggested above.
* No tests exist to verify the rewrite of {{decode()}} into {{CASE}}. Add such
tests, including the use of the {{IS NOT DISTINCT}} operator as noted above.
(Done)
* Add tests for the additional simplifications described above. (Done)
Tasks will be grouped into separate JIRA tickets to keep each code review to a
manageable size.
> Review and modernize conditional function rewrites
> --------------------------------------------------
>
> Key: IMPALA-7747
> URL: https://issues.apache.org/jira/browse/IMPALA-7747
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> IMPALA-7655 asks to revisit the rewrite rules for several conditional
> functions. [~philip] suggested that the rewrite rules should apply to [all of
> them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
> To keep IMPALA-7655 focused, the larger review is presented here, along with
> suggested opportunities to modernize the front-end rewrite rules.
> This is the top-level task for the review tasks, each change is identified by
> a sub-task or linked task in order to keep each code review task small.
> h4. Overview
> The full set of conditional functions include:
> {noformat}
> if(boolean condition, type ifTrue, type ifFalseOrNull)
> ifnull(type a, type ifNull)
> isfalse(boolean)
> isnotfalse(boolean)
> isnottrue(boolean)
> isnull(type a, type ifNull)
> istrue(boolean)
> nonnullvalue(expression)
> nullif(expr1,expr2)
> nullifzero(numeric_expr)
> nullvalue(expression)
> nvl(type a, type ifNull)
> nvl2(type a, type ifNull, type ifNotNull)
> zeroifnull(numeric_expr)
> {noformat}
> Turns out conditionals are complex as substantial prior work has gone into
> optimizations. The FE has a number of transforms that affect specific
> conditional statements. The BE has additional transforms. To proceed, each
> operation must be tracked through the system one by one.
> The discussion below summarizes the state of each of the Impala conditional
> functions to identify the path needed to implement the requested changes, and
> to ensure that the changes don't impact other functionality. We also point
> out a few out-of-scope nice-to-haves as we go along.
> In general, all the action here is in just a few places:
> * {{sql-parser.cup}} in which syntax is reduced to parse nodes such as
> functions or operators. The parser unifies certain constructs such as {{<=>}}
> and {{IS NOT DISTINCT FROM}}.
> * {{FunctionCallExpr.createExpr()}} is given a function-like definition and
> converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}.
> A nice-to-have would be to move this logic to
> {{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing
> transforms.
> * {{SimplifyConditionalsRule}} does a great many transforms of various
> conditional rules. (We will add more for this task.)
> * {{impala_functions.py}} in the BE provides a mapping from remaining
> functions (those not optimized away above) to implementations. All functions
> listed here are cross-compiled into LLVM along with a generated wrapper
> function that binds the function to its set of arguments.
> * {{conditional-functions.[h|cc]}} handles special case functions that
> require short-circuit argument evaluation ({{isull()}}, {{if()}},
> {{coalesce()}}). These three functions are never code generated. The goal of
> this task is to convert these into a code generated for using {{CASE}}.
> For all expressions, the planner does a check for all-constant expressions
> (such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with
> the result of the expression by using the BE to interpret the partial
> constant-only expression tree. As a result, the rewrite steps focus on the
> non-trivial cases that require knowledge of the semantics of a given function.
> In the suggestions that follow, we rewrite certain functions into {{CASE}}.
> But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks
> to resolve that issue.
> Below is a summary of each conditional function that identifies current state
> and any changes that might be possible.
> h4. {{CASE ...}}
> BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated
> when in the {{WHERE}} clause or in a join.
> h4. {{x IS [NOT] (TRUE | FALSE)}}
> FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent
> {{ISTRUE\(x)}}, etc. function.
> h4. {{x IS [NOT] NULL}}
> FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is
> the opposite of {{IS TRUE}}, etc.)
> BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.
> h4. {{IS[NOT](TRUE|FALSE)\(x)}}
>
> BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.
> h4. {{NULLIF(expr1, expr2)}}
> FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS
> DISTINCT FROM expr2, expr1, NULL)}}
> {{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no
> entry for {{nullify()}} in {{impala_functions.py}}.
> Note that the implementation here is different from the
> [docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]
> which suggests that the rewrite uses equality. Both for normal data and
> nulls. However, the implementation actually will handle the NaN case for
> floats once IMPALA-6661 is fixed:
> {code:sql}
> 10 * NULLIF(x, sqrt(-1))
> {code}
> The above will produce a {{NULL}} if {{x}} is {{NaN}}, {{10 * x}} otherwise.
> This is a hidden bonus of the current implementation.
> *Suggestion:*
> * Move the rewrite rules from {{FunctionCallExpr.createExpr()}} into
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it
> appears in {{_impala_builtins}}.
> * Add two simplification rules:
> * {{nullif(NULL, x)}} → {{NULL}}
> * {{nullif(x, NULL)}} → {{NULL}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr1 IS DISTINCT FROM expr2 THEN expr1 END
> {code}
> h4. {{NVL2(expr, ifNotNull, ifNull)}}
> FE, {{FunctionCallExpr}}: Rewritten to {{if(expr IS NOT NULL, ifNotNull,
> ifNull)}}. {{nvl2()}} vanishes from the plan at this point and does not
> appear in {{impala_functions.py}}.
> *Suggestion:*
> * Move rewrite from {{FunctionCallExpr.createExpr()}} into
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it
> appears in {{_impala_builtins}}.
> * Add two simplifications:
> * {{nvl2(null, a, b)}} → {{b}}
> * {{nvl2(non-null-listeral, a, b)}} → {{a}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr IS NOT NULL THEN ifNotNull ELSE ifNull END
> {code}
> As it turns out {{decode()}} is a rather special beast because it needs to
> declare n^2 versions for the full set of types. For this reason, we can't add
> it to {{impala_functions.py}} and thus can't move the rewrite rules. We'll
> leave it as the lone remaining rewrite in {{FunctionCallExpr.createExpr()}}.
> h4. {{ISNULL(a, b)}}
> BE: Alias for this method exist in {{impala_functions.py}}, special
> implementation in {{conditional-functions.[h|cc]}}.
> *Suggestion:* Rewrite as:
> {code:sql}
> CASE a IS NULL THEN b ELSE a END
> {code}
> Since {{isnull()}} would vanish from the plan after this transform, remove
> the BE implementation.
> h4. {{NVL(a, b)}} \\ {{IFNULL(a, b)}}
> FE, {{SimplifyConditional}}: Treated same as {{ISNULL(a, b)}}, but is not
> rewritten to this form.
> BE: Alias for this method exist in {{impala_functions.py}}.
> *Suggestion:* Rewrite to {{ISNULL(a, b)}}, drop from {{impala_functions.py}}
> to make things a bit more tidy. (If the suggestion for {{isnull()}} is taken,
> then even {{isnull()}} vanishes from the plan in the planner.
> h4. {{[NON]NULLVALUE\(x)}}
> An entry in {{impala_functions.py}} maps this method to the compiled {{IS
> [NOT] NULL}} operator implementations.
> *Suggestion:* To make {{impala_functions.py}} less messy, add a transform to
> the FE to replace these functions with the operators, and remove the
> functions' entries from {{impala_functions.py}}. This also ensures that all
> optimization applied to the operators is also done for the functions.
> h4. {{x <=> (TRUE | FALSE | NULL)}} \\ {{x IS [NOT] DISTINCT FROM (TRUE |
> FALSE | NULL)}}
> FE {{sql-parser.cup}}: Parsed (in generic form) into a
> {{BinaryPredicate(BinaryPredicate.Operator.(NOT_DISTINCT|DISTINCT_FROM)...)}}
> BE: Implemented code generated
> {{Operators::NotDistinct_BooleanVal_BooleanVal}}.
> *Suggestion:* To leverage special Boolean optimizations, rewrite the above to
> {{IS(TRUE|FALSE)\(x)}} or {{x IS [NOT] NULL}} in the planner. (The planner
> appears to already rewrite expressions such as {{TRUE <=> x}} into a
> canonical form so that the rewrite rules need not handle both versions.)
> Note: there is no function equivalent of these functions, they are
> "invisible" to the user, but are listed as {{distinctfrom}} and
> {{notdistinct}} in {{impala_functions.py}}.
> h4. {{IF(cond, trueExpr, falseExpr)}}
> FE: {{SimplifyConditional}} performs basic simplifications.
> BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
> function to allow short-circuit argument evaluation.
> *Suggestion:* Rewrite in the FE to
> {code:sql}
> CASE WHEN cond THEN trueExpr ELSE falseExpr END
> {code}
> {{IF()}} will then vanish from the plan so remove the BE implementation.
> h4. {{COALESCE(e1, e2, … en)}}
> FE: {{SimplifyConditional}} performs basic simplifications.
> BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
> function to allow short-circuit argument evaluation.
> *Suggestion:* Rewrite in the FE to
> {noformat}
> CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END
> {noformat}
> When doing so, extend two existing optimizations.
> 1. Remove not only leading null values, but all null values.
> 2. Special case not just the last non-null literal, but rather when
> encountering the first such value, drop all remaining terms.
> {{COLAESCE()}} will then vanish from the plan so remove the BE
> implementation. Since this step will remove the last of the special
> conditional functions, remove {{conditional-functions.[h|cc]}} as well.
> h4. {{DECODE(expr, search1, result1 [, search2, result2 ...] [, default] )}}
> FE: {{FunctionCallExpr}}, {{CaseExpr}}: Rewrites {{decode()}} to {{CASE}}.
> {{decode()}} vanishes from the plan after this step.
> See the header of {{CaseExpr.java}} for details. Looks like the
> implementation was done before {{IS DISTINCT}} was available:
> {quote}
> Example of equivalent {{CASE}} for {{DECODE(foo, 'bar', 1, col, 2, NULL, 3,
> 4)}}:
> {code:sql}
> CASE
> WHEN foo = 'bar' THEN 1 -- no need for IS NULL check
> WHEN foo IS NULL AND col IS NULL OR foo = col THEN 2
> WHEN foo IS NULL THEN 3 -- no need for equality check
> ELSE 4
> END
> {code}
> {quote}
> *Nice-to-have:* In FE, modify to use {{<=>}} (AKA {{IS NOT DISTINCT}}):
> {code:sql}
> CASE [WHEN expr <=> searchi THEN resulti]+ [ELSE default]? END
> {code}
> Example:
> {code:sql}
> CASE
> WHEN foo <=> 'bar' THEN 1
> WHEN foo <=> col THEN 2
> WHEN foo <=> NULL THEN 3
> ELSE 4
> END
> {code}
> This expansion (and the original one) evaluates the decode expression
> multiple times and would benefit from the optimization mentioned earlier.
> Note also that {{decode()}} can be used to pick out floating-point NaN values:
> {code:sql}
> decode(float_col, sqrt(-1), 0, float_col)
> {code}
> Here, {{sqrt(-1)}} is used to create a NaN value because Impala has no
> {{NaN}} constant or function.
> *Suggestion:* The current implementation is rather ad-hoc, probably because
> of the unusual nature of the types of the arguments to {{decode()}}. Would be
> cleaner to do the rewrite as rewrite rule rather than as an ad-hoc step when
> creating an expression. That is, rather than doing the rewrite in
> {{FunctionCallExpr.createExpr()}}, do it in
> {{SimplifyConditionalsRule.apply()}}.
> Doing this would allow us to add an entry for {[decode()}} in the
> builtin-functions table. To handle the odd arguments, create a one-off
> {{ScalarFunction}} subclass do to the specialized argument matching.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]