[
https://issues.apache.org/jira/browse/IMPALA-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667519#comment-16667519
]
Paul Rogers commented on IMPALA-7747:
-------------------------------------
Work left over from IMPALA-7655:
h4. Remove \{{conditional-functions.[h|cc]}}
Once IMPALA-7753 is fixed, change the rewrite rules to make the conditional
rewrites mandatory, rather than optional. At that point, the BE implementations
will never be called. So, remove \{{conditional-functions.[h|cc]}} as well.
> Review and modernize conditional function rewrites
> --------------------------------------------------
>
> Key: IMPALA-7747
> URL: https://issues.apache.org/jira/browse/IMPALA-7747
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> IMPALA-7655 asks to revisit the rewrite rules for several conditional
> functions. [~philip] suggested that the rewrite rules should apply to [all of
> them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
> To keep IMPALA-7655 focused, the larger review is presented here, along with
> suggested opportunities to modernize the front-end rewrite rules.
> This is the top-level task for the review tasks, each change is identified by
> a sub-task or linked task in order to keep each code review task small.
> h4. Overview
> The full set of conditional functions include:
> {noformat}
> if(boolean condition, type ifTrue, type ifFalseOrNull)
> ifnull(type a, type ifNull)
> isfalse(boolean)
> isnotfalse(boolean)
> isnottrue(boolean)
> isnull(type a, type ifNull)
> istrue(boolean)
> nonnullvalue(expression)
> nullif(expr1,expr2)
> nullifzero(numeric_expr)
> nullvalue(expression)
> nvl(type a, type ifNull)
> nvl2(type a, type ifNull, type ifNotNull)
> zeroifnull(numeric_expr)
> {noformat}
> Turns out conditionals are complex as substantial prior work has gone into
> optimizations. The FE has a number of transforms that affect specific
> conditional statements. The BE has additional transforms. To proceed, each
> operation must be tracked through the system one by one.
> The discussion below summarizes the state of each of the Impala conditional
> functions to identify the path needed to implement the requested changes, and
> to ensure that the changes don't impact other functionality. We also point
> out a few out-of-scope nice-to-haves as we go along.
> In general, all the action here is in just a few places:
> * {{sql-parser.cup}} in which syntax is reduced to parse nodes such as
> functions or operators. The parser unifies certain constructs such as {{<=>}}
> and {{IS NOT DISTINCT FROM}}.
> * {{FunctionCallExpr.createExpr()}} is given a function-like definition and
> converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}.
> A nice-to-have would be to move this logic to
> {{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing
> transforms.
> * {{SimplifyConditionalsRule}} does a great many transforms of various
> conditional rules. (We will add more for this task.)
> * {{impala_functions.py}} in the BE provides a mapping from remaining
> functions (those not optimized away above) to implementations. All functions
> listed here are cross-compiled into LLVM along with a generated wrapper
> function that binds the function to its set of arguments.
> * {{conditional-functions.[h|cc]}} handles special case functions that
> require short-circuit argument evaluation ({{isull()}}, {{if()}},
> {{coalesce()}}). These three functions are never code generated. The goal of
> this task is to convert these into a code generated for using {{CASE}}.
> For all expressions, the planner does a check for all-constant expressions
> (such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with
> the result of the expression by using the BE to interpret the partial
> constant-only expression tree. As a result, the rewrite steps focus on the
> non-trivial cases that require knowledge of the semantics of a given function.
> In the suggestions that follow, we rewrite certain functions into {{CASE}}.
> But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks
> to resolve that issue.
> Below is a summary of each conditional function that identifies current state
> and any changes that might be possible.
> h4. {{CASE ...}}
> BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated
> when in the {{WHERE}} clause or in a join.
> h4. {{x IS [NOT] (TRUE | FALSE)}}
> FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent
> {{ISTRUE\(x)}}, etc. function.
> h4. {{x IS [NOT] NULL}}
> FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is
> the opposite of {{IS TRUE}}, etc.)
> BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.
> h4. {{IS[NOT](TRUE|FALSE)\(x)}}
>
> BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.
> h4. {{NULLIF(expr1, expr2)}}
> FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS
> DISTINCT FROM expr2, expr1, NULL)}}
> {{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no
> entry for {{nullify()}} in {{impala_functions.py}}.
> Note that the implementation here is different from the
> [docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]
> which suggests that the rewrite uses equality. Both for normal data and
> nulls. However, the implementation actually will handle the NaN case for
> floats once IMPALA-6661 is fixed:
> {code:sql}
> 10 * NULLIF(x, sqrt(-1))
> {code}
> The above will produce a {{NULL}} if {{x}} is {{NaN}}, {{10 * x}} otherwise.
> This is a hidden bonus of the current implementation.
> *Suggestion:*
> * Move the rewrite rules from {{FunctionCallExpr.createExpr()}} into
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it
> appears in {{_impala_builtins}}.
> * Add two simplification rules:
> * {{nullif(NULL, x)}} → {{NULL}}
> * {{nullif(x, NULL)}} → {{NULL}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr1 IS DISTINCT FROM expr2 THEN expr1 END
> {code}
> h4. {{NVL2(expr, ifNotNull, ifNull)}}
> FE, {{FunctionCallExpr}}: Rewritten to {{if(expr IS NOT NULL, ifNotNull,
> ifNull)}}. {{nvl2()}} vanishes from the plan at this point and does not
> appear in {{impala_functions.py}}.
> *Suggestion:*
> * Move rewrite from {{FunctionCallExpr.createExpr()}} into
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it
> appears in {{_impala_builtins}}.
> * Add two simplifications:
> * {{nvl2(null, a, b)}} → {{b}}
> * {{nvl2(non-null-listeral, a, b)}} → {{a}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr IS NOT NULL THEN ifNotNull ELSE ifNull END
> {code}
> h4. {{[NON]NULLVALUE\(x)}}
> An entry in {{impala_functions.py}} maps this method to the compiled {{IS
> [NOT] NULL}} operator implementations.
> *Suggestion:* To make {{impala_functions.py}} less messy, add a transform to
> the FE to replace these functions with the operators, and remove the
> functions' entries from {{impala_functions.py}}. This also ensures that all
> optimization applied to the operators is also done for the functions.
> h4. {{x <=> (TRUE | FALSE | NULL)}} \\ {{x IS [NOT] DISTINCT FROM (TRUE |
> FALSE | NULL)}}
> FE {{sql-parser.cup}}: Parsed (in generic form) into a
> {{BinaryPredicate(BinaryPredicate.Operator.(NOT_DISTINCT|DISTINCT_FROM)...)}}
> BE: Implemented code generated
> {{Operators::NotDistinct_BooleanVal_BooleanVal}}.
> *Suggestion:* To leverage special Boolean optimizations, rewrite the above to
> {{IS(TRUE|FALSE)\(x)}} or {{x IS [NOT] NULL}} in the planner. (The planner
> appears to already rewrite expressions such as {{TRUE <=> x}} into a
> canonical form so that the rewrite rules need not handle both versions.)
> Note: there is no function equivalent of these functions, they are
> "invisible" to the user, but are listed as {{distinctfrom}} and
> {{notdistinct}} in {{impala_functions.py}}.
> h4. {{NULLIFZERO\(x)}} \\ {{ZEROIFNULL\(x)}}
> BE: Implemented as native functions; code generated with wrapper functions.
> h4. {{ISNULL(a, b)}} \\ {{NVL(a, b)}} \\ {{IFNULL(a, b)}} \\ {{IF(cond,
> trueExpr, falseExpr)}} \\ {{COALESCE(e1, e2, … en)}}
> See IMPALA-7655.
> h4. {{DECODE(expr, search1, result1 [, search2, result2 ...] [, default] )}}
> FE: {{FunctionCallExpr}}, {{CaseExpr}}: Rewrites {{decode()}} to {{CASE}}.
> {{decode()}} vanishes from the plan after this step.
> See the header of {{CaseExpr.java}} for details. Looks like the
> implementation was done before {{IS DISTINCT}} was available:
> {quote}
> Example of equivalent {{CASE}} for {{DECODE(foo, 'bar', 1, col, 2, NULL, 3,
> 4)}}:
> {code:sql}
> CASE
> WHEN foo = 'bar' THEN 1 -- no need for IS NULL check
> WHEN foo IS NULL AND col IS NULL OR foo = col THEN 2
> WHEN foo IS NULL THEN 3 -- no need for equality check
> ELSE 4
> END
> {code}
> {quote}
> *Nice-to-have:* In FE, modify to use {{<=>}} (AKA {{IS NOT DISTINCT}}):
> {code:sql}
> CASE [WHEN expr <=> searchi THEN resulti]+ [ELSE default]? END
> {code}
> Example:
> {code:sql}
> CASE
> WHEN foo <=> 'bar' THEN 1
> WHEN foo <=> col THEN 2
> WHEN foo <=> NULL THEN 3
> ELSE 4
> END
> {code}
> This expansion (and the original one) evaluates the decode expression
> multiple times and would benefit from the optimization mentioned earlier.
> Note also that {{decode()}} can be used to pick out floating-point NaN values:
> {code:sql}
> decode(float_col, sqrt(-1), 0, float_col)
> {code}
> Here, {{sqrt(-1)}} is used to create a NaN value because Impala has no
> {{NaN}} constant or function.
> As it turns out {{decode()}} is a rather special beast because it needs to
> declare n^2 versions for the full set of types. For this reason, we can't add
> it to {{impala_functions.py}} and thus can't move the rewrite rules. We'll
> leave it as the lone remaining rewrite in {{FunctionCallExpr.createExpr()}}.
> *Suggestion:* The current implementation is rather ad-hoc, probably because
> of the unusual nature of the types of the arguments to {{decode()}}. Would be
> cleaner to do the rewrite as rewrite rule rather than as an ad-hoc step when
> creating an expression. That is, rather than doing the rewrite in
> {{FunctionCallExpr.createExpr()}}, do it in
> {{SimplifyConditionalsRule.apply()}}.
> Doing this would allow us to add an entry for {[decode()}} in the
> builtin-functions table. To handle the odd arguments, create a one-off
> {{ScalarFunction}} subclass do to the specialized argument matching.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]