[
https://issues.apache.org/jira/browse/IMPALA-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated IMPALA-7747:
--------------------------------
Description:
IMPALA-7655 asks to revisit the rewrite rules for several conditional
functions. [~philip] suggested that the rewrite rules should apply to [all of
them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
To keep IMPALA-7655 focused, the larger review is presented here, along with
suggested opportunities to modernize the front-end rewrite rules.
This is the top-level task for the review tasks, each change is identified by a
sub-task or linked task in order to keep each code review task small.
h4. Overview
The full set of conditional functions include:
{noformat}
if(boolean condition, type ifTrue, type ifFalseOrNull)
ifnull(type a, type ifNull)
isfalse(boolean)
isnotfalse(boolean)
isnottrue(boolean)
isnull(type a, type ifNull)
istrue(boolean)
nonnullvalue(expression)
nullif(expr1,expr2)
nullifzero(numeric_expr)
nullvalue(expression)
nvl(type a, type ifNull)
nvl2(type a, type ifNull, type ifNotNull)
zeroifnull(numeric_expr)
{noformat}
Turns out conditionals are complex as substantial prior work has gone into
optimizations. The FE has a number of transforms that affect specific
conditional statements. The BE has additional transforms. To proceed, each
operation must be tracked through the system one by one.
The discussion below summarizes the state of each of the Impala conditional
functions to identify the path needed to implement the requested changes, and
to ensure that the changes don't impact other functionality. We also point out
a few out-of-scope nice-to-haves as we go along.
In general, all the action here is in just a few places:
* {{sql-parser.cup}} in which syntax is reduced to parse nodes such as
functions or operators. The parser unifies certain constructs such as {{<=>}}
and {{IS NOT DISTINCT FROM}}.
* {{FunctionCallExpr.createExpr()}} is given a function-like definition and
converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}. A
nice-to-have would be to move this logic to
{{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing
transforms.
* {{SimplifyConditionalsRule}} does a great many transforms of various
conditional rules. (We will add more for this task.)
* {{impala_functions.py}} in the BE provides a mapping from remaining functions
(those not optimized away above) to implementations. All functions listed here
are cross-compiled into LLVM along with a generated wrapper function that binds
the function to its set of arguments.
* {{conditional-functions.[h|cc]}} handles special case functions that require
short-circuit argument evaluation ({{isull()}}, {{if()}}, {{coalesce()}}).
These three functions are never code generated. The goal of this task is to
convert these into a code generated for using {{CASE}}.
For all expressions, the planner does a check for all-constant expressions
(such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with
the result of the expression by using the BE to interpret the partial
constant-only expression tree. As a result, the rewrite steps focus on the
non-trivial cases that require knowledge of the semantics of a given function.
In the suggestions that follow, we rewrite certain functions into {{CASE}}.
But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks to
resolve that issue.
Below is a summary of each conditional function that identifies current state
and any changes that might be possible.
h4. {{CASE ...}}
BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated
when in the {{WHERE}} clause or in a join.
h4. {{x IS [NOT] (TRUE | FALSE)}}
FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent
{{ISTRUE\(x)}}, etc. function.
h4. {{x IS [NOT] NULL}}
FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is
the opposite of {{IS TRUE}}, etc.)
BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.
h4. {{IS[NOT](TRUE|FALSE)\(x)}}
BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.
h4. {{NULLIF(expr1, expr2)}}
FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS
DISTINCT FROM expr2, expr1, NULL)}}
{{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no
entry for {{nullify()}} in {{impala_functions.py}}.
Note that the implementation here is different from the
[docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]
which suggests that the rewrite uses equality. Both for normal data and nulls.
However, the implementation actually will handle the NaN case for floats once
IMPALA-6661 is fixed:
{code:sql}
10 * NULLIF(x, sqrt(-1))
{code}
The above will produce a {{NULL}} if {{x}} is {{NaN}}, {{10 * x}} otherwise.
This is a hidden bonus of the current implementation.
*Suggestion:*
* Move the rewrite rules from {{FunctionCallExpr.createExpr()}} into
{{SimplifyConditionalRules}}.
* Add a signature for this function to {{impala_functions.py}} so that it
appears in {{_impala_builtins}}.
* Add two simplification rules:
* {{nullif(NULL, x)}} → {{NULL}}
* {{nullif(x, NULL)}} → {{NULL}}
* Directly rewrite to a {{CASE}} expression:
{code:sql}
CASE WHEN expr1 IS DISTINCT FROM expr2 THEN expr1 END
{code}
h4. {{NVL2(expr, ifNotNull, ifNull)}}
FE, {{FunctionCallExpr}}: Rewritten to {{if(expr IS NOT NULL, ifNotNull,
ifNull)}}. {{nvl2()}} vanishes from the plan at this point and does not appear
in {{impala_functions.py}}.
*Suggestion:*
* Move rewrite from {{FunctionCallExpr.createExpr()}} into
{{SimplifyConditionalRules}}.
* Add a signature for this function to {{impala_functions.py}} so that it
appears in {{_impala_builtins}}.
* Add two simplifications:
* {{nvl2(null, a, b)}} → {{b}}
* {{nvl2(non-null-listeral, a, b)}} → {{a}}
* Directly rewrite to a {{CASE}} expression:
{code:sql}
CASE WHEN expr IS NOT NULL THEN ifNotNull ELSE ifNull END
{code}
h4. {{[NON]NULLVALUE\(x)}}
An entry in {{impala_functions.py}} maps this method to the compiled {{IS [NOT]
NULL}} operator implementations.
*Suggestion:* To make {{impala_functions.py}} less messy, add a transform to
the FE to replace these functions with the operators, and remove the functions'
entries from {{impala_functions.py}}. This also ensures that all optimization
applied to the operators is also done for the functions.
h4. {{x <=> (TRUE | FALSE | NULL)}} \\ {{x IS [NOT] DISTINCT FROM (TRUE | FALSE
| NULL)}}
FE {{sql-parser.cup}}: Parsed (in generic form) into a
{{BinaryPredicate(BinaryPredicate.Operator.(NOT_DISTINCT|DISTINCT_FROM)...)}}
BE: Implemented code generated {{Operators::NotDistinct_BooleanVal_BooleanVal}}.
*Suggestion:* To leverage special Boolean optimizations, rewrite the above to
{{IS(TRUE|FALSE)\(x)}} or {{x IS [NOT] NULL}} in the planner. (The planner
appears to already rewrite expressions such as {{TRUE <=> x}} into a canonical
form so that the rewrite rules need not handle both versions.)
Note: there is no function equivalent of these functions, they are "invisible"
to the user, but are listed as {{distinctfrom}} and {{notdistinct}} in
{{impala_functions.py}}.
h4. {{NULLIFZERO\(x)}} \\ {{ZEROIFNULL\(x)}}
BE: Implemented as native functions; code generated with wrapper functions.
h4. {{ISNULL(a, b)}} \\ {{NVL(a, b)}} \\ {{IFNULL(a, b)}} \\ {{IF(cond,
trueExpr, falseExpr)}} \\ {{COALESCE(e1, e2, … en)}}
See IMPALA-7655.
h4. {{DECODE(expr, search1, result1 [, search2, result2 ...] [, default] )}}
FE: {{FunctionCallExpr}}, {{CaseExpr}}: Rewrites {{decode()}} to {{CASE}}.
{{decode()}} vanishes from the plan after this step.
See the header of {{CaseExpr.java}} for details. Looks like the implementation
was done before {{IS DISTINCT}} was available:
{quote}
Example of equivalent {{CASE}} for {{DECODE(foo, 'bar', 1, col, 2, NULL, 3,
4)}}:
{code:sql}
CASE
WHEN foo = 'bar' THEN 1 -- no need for IS NULL check
WHEN foo IS NULL AND col IS NULL OR foo = col THEN 2
WHEN foo IS NULL THEN 3 -- no need for equality check
ELSE 4
END
{code}
{quote}
*Nice-to-have:* In FE, modify to use {{<=>}} (AKA {{IS NOT DISTINCT}}):
{code:sql}
CASE [WHEN expr <=> searchi THEN resulti]+ [ELSE default]? END
{code}
Example:
{code:sql}
CASE
WHEN foo <=> 'bar' THEN 1
WHEN foo <=> col THEN 2
WHEN foo <=> NULL THEN 3
ELSE 4
END
{code}
This expansion (and the original one) evaluates the decode expression multiple
times and would benefit from the optimization mentioned earlier.
Note also that {{decode()}} can be used to pick out floating-point NaN values:
{code:sql}
decode(float_col, sqrt(-1), 0, float_col)
{code}
Here, {{sqrt(-1)}} is used to create a NaN value because Impala has no {{NaN}}
constant or function.
As it turns out {{decode()}} is a rather special beast because it needs to
declare n^2 versions for the full set of types. For this reason, we can't add
it to {{impala_functions.py}} and thus can't move the rewrite rules. We'll
leave it as the lone remaining rewrite in {{FunctionCallExpr.createExpr()}}.
*Suggestion:* The current implementation is rather ad-hoc, probably because of
the unusual nature of the types of the arguments to {{decode()}}. Would be
cleaner to do the rewrite as rewrite rule rather than as an ad-hoc step when
creating an expression. That is, rather than doing the rewrite in
{{FunctionCallExpr.createExpr()}}, do it in
{{SimplifyConditionalsRule.apply()}}.
Doing this would allow us to add an entry for {[decode()}} in the
builtin-functions table. To handle the odd arguments, create a one-off
{{ScalarFunction}} subclass do to the specialized argument matching.
was:
IMPALA-7655 asks to revisit the rewrite rules for several conditional
functions. [~philip] suggested that the rewrite rules should apply to [all of
them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
To keep IMPALA-7655 focused, the larger review is presented here, along with
suggested opportunities to modernize the front-end rewrite rules.
This is the top-level task for the review tasks, each change is identified by a
sub-task or linked task in order to keep each code review task small.
h4. Overview
The full set of conditional functions include:
{noformat}
if(boolean condition, type ifTrue, type ifFalseOrNull)
ifnull(type a, type ifNull)
isfalse(boolean)
isnotfalse(boolean)
isnottrue(boolean)
isnull(type a, type ifNull)
istrue(boolean)
nonnullvalue(expression)
nullif(expr1,expr2)
nullifzero(numeric_expr)
nullvalue(expression)
nvl(type a, type ifNull)
nvl2(type a, type ifNull, type ifNotNull)
zeroifnull(numeric_expr)
{noformat}
Turns out conditionals are complex as substantial prior work has gone into
optimizations. The FE has a number of transforms that affect specific
conditional statements. The BE has additional transforms. To proceed, each
operation must be tracked through the system one by one.
The discussion below summarizes the state of each of the Impala conditional
functions to identify the path needed to implement the requested changes, and
to ensure that the changes don't impact other functionality. We also point out
a few out-of-scope nice-to-haves as we go along.
In general, all the action here is in just a few places:
* {{sql-parser.cup}} in which syntax is reduced to parse nodes such as
functions or operators. The parser unifies certain constructs such as {{<=>}}
and {{IS NOT DISTINCT FROM}}.
* {{FunctionCallExpr.createExpr()}} is given a function-like definition and
converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}. A
nice-to-have would be to move this logic to
{{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing
transforms.
* {{SimplifyConditionalsRule}} does a great many transforms of various
conditional rules. (We will add more for this task.)
* {{impala_functions.py}} in the BE provides a mapping from remaining functions
(those not optimized away above) to implementations. All functions listed here
are cross-compiled into LLVM along with a generated wrapper function that binds
the function to its set of arguments.
* {{conditional-functions.[h|cc]}} handles special case functions that require
short-circuit argument evaluation ({{isull()}}, {{if()}}, {{coalesce()}}).
These three functions are never code generated. The goal of this task is to
convert these into a code generated for using {{CASE}}.
For all expressions, the planner does a check for all-constant expressions
(such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with
the result of the expression by using the BE to interpret the partial
constant-only expression tree. As a result, the rewrite steps focus on the
non-trivial cases that require knowledge of the semantics of a given function.
In the suggestions that follow, we rewrite certain functions into {{CASE}}.
But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks to
resolve that issue.
Below is a summary of each conditional function that identifies current state
and any changes that might be possible.
h4. {{CASE ...}}
BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated
when in the {{WHERE}} clause or in a join.
h4. {{x IS [NOT] (TRUE | FALSE)}}
FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent
{{ISTRUE\(x)}}, etc. function.
h4. {{x IS [NOT] NULL}}
FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is
the opposite of {{IS TRUE}}, etc.)
BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.
h4. {{IS[NOT](TRUE|FALSE)\(x)}}
BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.
h4. {{NULLIF(expr1, expr2)}}
FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS
DISTINCT FROM expr2, expr1, NULL)}}
{{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no
entry for {{nullify()}} in {{impala_functions.py}}.
Note that the implementation here is different from the
[docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]
which suggests that the rewrite uses equality. Both for normal data and nulls.
However, the implementation actually will handle the NaN case for floats once
IMPALA-6661 is fixed:
{code:sql}
10 * NULLIF(x, sqrt(-1))
{code}
The above will produce a {{NULL}} if {{x}} is {{NaN}}, {{10 * x}} otherwise.
This is a hidden bonus of the current implementation.
*Suggestion:*
* Move the rewrite rules from {{FunctionCallExpr.createExpr()}} into
{{SimplifyConditionalRules}}.
* Add a signature for this function to {{impala_functions.py}} so that it
appears in {{_impala_builtins}}.
* Add two simplification rules:
* {{nullif(NULL, x)}} → {{NULL}}
* {{nullif(x, NULL)}} → {{NULL}}
* Directly rewrite to a {{CASE}} expression:
{code:sql}
CASE WHEN expr1 IS DISTINCT FROM expr2 THEN expr1 END
{code}
h4. {{NVL2(expr, ifNotNull, ifNull)}}
FE, {{FunctionCallExpr}}: Rewritten to {{if(expr IS NOT NULL, ifNotNull,
ifNull)}}. {{nvl2()}} vanishes from the plan at this point and does not appear
in {{impala_functions.py}}.
*Suggestion:*
* Move rewrite from {{FunctionCallExpr.createExpr()}} into
{{SimplifyConditionalRules}}.
* Add a signature for this function to {{impala_functions.py}} so that it
appears in {{_impala_builtins}}.
* Add two simplifications:
* {{nvl2(null, a, b)}} → {{b}}
* {{nvl2(non-null-listeral, a, b)}} → {{a}}
* Directly rewrite to a {{CASE}} expression:
{code:sql}
CASE WHEN expr IS NOT NULL THEN ifNotNull ELSE ifNull END
{code}
As it turns out {{decode()}} is a rather special beast because it needs to
declare n^2 versions for the full set of types. For this reason, we can't add
it to {{impala_functions.py}} and thus can't move the rewrite rules. We'll
leave it as the lone remaining rewrite in {{FunctionCallExpr.createExpr()}}.
h4. {{ISNULL(a, b)}}
BE: Alias for this method exist in {{impala_functions.py}}, special
implementation in {{conditional-functions.[h|cc]}}.
*Suggestion:* Rewrite as:
{code:sql}
CASE a IS NULL THEN b ELSE a END
{code}
Since {{isnull()}} would vanish from the plan after this transform, remove the
BE implementation.
h4. {{NVL(a, b)}} \\ {{IFNULL(a, b)}}
FE, {{SimplifyConditional}}: Treated same as {{ISNULL(a, b)}}, but is not
rewritten to this form.
BE: Alias for this method exist in {{impala_functions.py}}.
*Suggestion:* Rewrite to {{ISNULL(a, b)}}, drop from {{impala_functions.py}} to
make things a bit more tidy. (If the suggestion for {{isnull()}} is taken, then
even {{isnull()}} vanishes from the plan in the planner.
h4. {{[NON]NULLVALUE\(x)}}
An entry in {{impala_functions.py}} maps this method to the compiled {{IS [NOT]
NULL}} operator implementations.
*Suggestion:* To make {{impala_functions.py}} less messy, add a transform to
the FE to replace these functions with the operators, and remove the functions'
entries from {{impala_functions.py}}. This also ensures that all optimization
applied to the operators is also done for the functions.
h4. {{x <=> (TRUE | FALSE | NULL)}} \\ {{x IS [NOT] DISTINCT FROM (TRUE | FALSE
| NULL)}}
FE {{sql-parser.cup}}: Parsed (in generic form) into a
{{BinaryPredicate(BinaryPredicate.Operator.(NOT_DISTINCT|DISTINCT_FROM)...)}}
BE: Implemented code generated {{Operators::NotDistinct_BooleanVal_BooleanVal}}.
*Suggestion:* To leverage special Boolean optimizations, rewrite the above to
{{IS(TRUE|FALSE)\(x)}} or {{x IS [NOT] NULL}} in the planner. (The planner
appears to already rewrite expressions such as {{TRUE <=> x}} into a canonical
form so that the rewrite rules need not handle both versions.)
Note: there is no function equivalent of these functions, they are "invisible"
to the user, but are listed as {{distinctfrom}} and {{notdistinct}} in
{{impala_functions.py}}.
h4. {{IF(cond, trueExpr, falseExpr)}}
FE: {{SimplifyConditional}} performs basic simplifications.
BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
function to allow short-circuit argument evaluation.
*Suggestion:* Rewrite in the FE to
{code:sql}
CASE WHEN cond THEN trueExpr ELSE falseExpr END
{code}
{{IF()}} will then vanish from the plan so remove the BE implementation.
h4. {{COALESCE(e1, e2, … en)}}
FE: {{SimplifyConditional}} performs basic simplifications.
BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
function to allow short-circuit argument evaluation.
*Suggestion:* Rewrite in the FE to
{noformat}
CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END
{noformat}
When doing so, extend two existing optimizations.
1. Remove not only leading null values, but all null values.
2. Special case not just the last non-null literal, but rather when
encountering the first such value, drop all remaining terms.
{{COLAESCE()}} will then vanish from the plan so remove the BE implementation.
Since this step will remove the last of the special conditional functions,
remove {{conditional-functions.[h|cc]}} as well.
h4. {{DECODE(expr, search1, result1 [, search2, result2 ...] [, default] )}}
FE: {{FunctionCallExpr}}, {{CaseExpr}}: Rewrites {{decode()}} to {{CASE}}.
{{decode()}} vanishes from the plan after this step.
See the header of {{CaseExpr.java}} for details. Looks like the implementation
was done before {{IS DISTINCT}} was available:
{quote}
Example of equivalent {{CASE}} for {{DECODE(foo, 'bar', 1, col, 2, NULL, 3,
4)}}:
{code:sql}
CASE
WHEN foo = 'bar' THEN 1 -- no need for IS NULL check
WHEN foo IS NULL AND col IS NULL OR foo = col THEN 2
WHEN foo IS NULL THEN 3 -- no need for equality check
ELSE 4
END
{code}
{quote}
*Nice-to-have:* In FE, modify to use {{<=>}} (AKA {{IS NOT DISTINCT}}):
{code:sql}
CASE [WHEN expr <=> searchi THEN resulti]+ [ELSE default]? END
{code}
Example:
{code:sql}
CASE
WHEN foo <=> 'bar' THEN 1
WHEN foo <=> col THEN 2
WHEN foo <=> NULL THEN 3
ELSE 4
END
{code}
This expansion (and the original one) evaluates the decode expression multiple
times and would benefit from the optimization mentioned earlier.
Note also that {{decode()}} can be used to pick out floating-point NaN values:
{code:sql}
decode(float_col, sqrt(-1), 0, float_col)
{code}
Here, {{sqrt(-1)}} is used to create a NaN value because Impala has no {{NaN}}
constant or function.
*Suggestion:* The current implementation is rather ad-hoc, probably because of
the unusual nature of the types of the arguments to {{decode()}}. Would be
cleaner to do the rewrite as rewrite rule rather than as an ad-hoc step when
creating an expression. That is, rather than doing the rewrite in
{{FunctionCallExpr.createExpr()}}, do it in
{{SimplifyConditionalsRule.apply()}}.
Doing this would allow us to add an entry for {[decode()}} in the
builtin-functions table. To handle the odd arguments, create a one-off
{{ScalarFunction}} subclass do to the specialized argument matching.
> Review and modernize conditional function rewrites
> --------------------------------------------------
>
> Key: IMPALA-7747
> URL: https://issues.apache.org/jira/browse/IMPALA-7747
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> IMPALA-7655 asks to revisit the rewrite rules for several conditional
> functions. [~philip] suggested that the rewrite rules should apply to [all of
> them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
> To keep IMPALA-7655 focused, the larger review is presented here, along with
> suggested opportunities to modernize the front-end rewrite rules.
> This is the top-level task for the review tasks, each change is identified by
> a sub-task or linked task in order to keep each code review task small.
> h4. Overview
> The full set of conditional functions include:
> {noformat}
> if(boolean condition, type ifTrue, type ifFalseOrNull)
> ifnull(type a, type ifNull)
> isfalse(boolean)
> isnotfalse(boolean)
> isnottrue(boolean)
> isnull(type a, type ifNull)
> istrue(boolean)
> nonnullvalue(expression)
> nullif(expr1,expr2)
> nullifzero(numeric_expr)
> nullvalue(expression)
> nvl(type a, type ifNull)
> nvl2(type a, type ifNull, type ifNotNull)
> zeroifnull(numeric_expr)
> {noformat}
> Turns out conditionals are complex as substantial prior work has gone into
> optimizations. The FE has a number of transforms that affect specific
> conditional statements. The BE has additional transforms. To proceed, each
> operation must be tracked through the system one by one.
> The discussion below summarizes the state of each of the Impala conditional
> functions to identify the path needed to implement the requested changes, and
> to ensure that the changes don't impact other functionality. We also point
> out a few out-of-scope nice-to-haves as we go along.
> In general, all the action here is in just a few places:
> * {{sql-parser.cup}} in which syntax is reduced to parse nodes such as
> functions or operators. The parser unifies certain constructs such as {{<=>}}
> and {{IS NOT DISTINCT FROM}}.
> * {{FunctionCallExpr.createExpr()}} is given a function-like definition and
> converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}.
> A nice-to-have would be to move this logic to
> {{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing
> transforms.
> * {{SimplifyConditionalsRule}} does a great many transforms of various
> conditional rules. (We will add more for this task.)
> * {{impala_functions.py}} in the BE provides a mapping from remaining
> functions (those not optimized away above) to implementations. All functions
> listed here are cross-compiled into LLVM along with a generated wrapper
> function that binds the function to its set of arguments.
> * {{conditional-functions.[h|cc]}} handles special case functions that
> require short-circuit argument evaluation ({{isull()}}, {{if()}},
> {{coalesce()}}). These three functions are never code generated. The goal of
> this task is to convert these into a code generated for using {{CASE}}.
> For all expressions, the planner does a check for all-constant expressions
> (such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with
> the result of the expression by using the BE to interpret the partial
> constant-only expression tree. As a result, the rewrite steps focus on the
> non-trivial cases that require knowledge of the semantics of a given function.
> In the suggestions that follow, we rewrite certain functions into {{CASE}}.
> But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks
> to resolve that issue.
> Below is a summary of each conditional function that identifies current state
> and any changes that might be possible.
> h4. {{CASE ...}}
> BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated
> when in the {{WHERE}} clause or in a join.
> h4. {{x IS [NOT] (TRUE | FALSE)}}
> FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent
> {{ISTRUE\(x)}}, etc. function.
> h4. {{x IS [NOT] NULL}}
> FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is
> the opposite of {{IS TRUE}}, etc.)
> BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.
> h4. {{IS[NOT](TRUE|FALSE)\(x)}}
>
> BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.
> h4. {{NULLIF(expr1, expr2)}}
> FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS
> DISTINCT FROM expr2, expr1, NULL)}}
> {{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no
> entry for {{nullify()}} in {{impala_functions.py}}.
> Note that the implementation here is different from the
> [docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]
> which suggests that the rewrite uses equality. Both for normal data and
> nulls. However, the implementation actually will handle the NaN case for
> floats once IMPALA-6661 is fixed:
> {code:sql}
> 10 * NULLIF(x, sqrt(-1))
> {code}
> The above will produce a {{NULL}} if {{x}} is {{NaN}}, {{10 * x}} otherwise.
> This is a hidden bonus of the current implementation.
> *Suggestion:*
> * Move the rewrite rules from {{FunctionCallExpr.createExpr()}} into
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it
> appears in {{_impala_builtins}}.
> * Add two simplification rules:
> * {{nullif(NULL, x)}} → {{NULL}}
> * {{nullif(x, NULL)}} → {{NULL}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr1 IS DISTINCT FROM expr2 THEN expr1 END
> {code}
> h4. {{NVL2(expr, ifNotNull, ifNull)}}
> FE, {{FunctionCallExpr}}: Rewritten to {{if(expr IS NOT NULL, ifNotNull,
> ifNull)}}. {{nvl2()}} vanishes from the plan at this point and does not
> appear in {{impala_functions.py}}.
> *Suggestion:*
> * Move rewrite from {{FunctionCallExpr.createExpr()}} into
> {{SimplifyConditionalRules}}.
> * Add a signature for this function to {{impala_functions.py}} so that it
> appears in {{_impala_builtins}}.
> * Add two simplifications:
> * {{nvl2(null, a, b)}} → {{b}}
> * {{nvl2(non-null-listeral, a, b)}} → {{a}}
> * Directly rewrite to a {{CASE}} expression:
> {code:sql}
> CASE WHEN expr IS NOT NULL THEN ifNotNull ELSE ifNull END
> {code}
> h4. {{[NON]NULLVALUE\(x)}}
> An entry in {{impala_functions.py}} maps this method to the compiled {{IS
> [NOT] NULL}} operator implementations.
> *Suggestion:* To make {{impala_functions.py}} less messy, add a transform to
> the FE to replace these functions with the operators, and remove the
> functions' entries from {{impala_functions.py}}. This also ensures that all
> optimization applied to the operators is also done for the functions.
> h4. {{x <=> (TRUE | FALSE | NULL)}} \\ {{x IS [NOT] DISTINCT FROM (TRUE |
> FALSE | NULL)}}
> FE {{sql-parser.cup}}: Parsed (in generic form) into a
> {{BinaryPredicate(BinaryPredicate.Operator.(NOT_DISTINCT|DISTINCT_FROM)...)}}
> BE: Implemented code generated
> {{Operators::NotDistinct_BooleanVal_BooleanVal}}.
> *Suggestion:* To leverage special Boolean optimizations, rewrite the above to
> {{IS(TRUE|FALSE)\(x)}} or {{x IS [NOT] NULL}} in the planner. (The planner
> appears to already rewrite expressions such as {{TRUE <=> x}} into a
> canonical form so that the rewrite rules need not handle both versions.)
> Note: there is no function equivalent of these functions, they are
> "invisible" to the user, but are listed as {{distinctfrom}} and
> {{notdistinct}} in {{impala_functions.py}}.
> h4. {{NULLIFZERO\(x)}} \\ {{ZEROIFNULL\(x)}}
> BE: Implemented as native functions; code generated with wrapper functions.
> h4. {{ISNULL(a, b)}} \\ {{NVL(a, b)}} \\ {{IFNULL(a, b)}} \\ {{IF(cond,
> trueExpr, falseExpr)}} \\ {{COALESCE(e1, e2, … en)}}
> See IMPALA-7655.
> h4. {{DECODE(expr, search1, result1 [, search2, result2 ...] [, default] )}}
> FE: {{FunctionCallExpr}}, {{CaseExpr}}: Rewrites {{decode()}} to {{CASE}}.
> {{decode()}} vanishes from the plan after this step.
> See the header of {{CaseExpr.java}} for details. Looks like the
> implementation was done before {{IS DISTINCT}} was available:
> {quote}
> Example of equivalent {{CASE}} for {{DECODE(foo, 'bar', 1, col, 2, NULL, 3,
> 4)}}:
> {code:sql}
> CASE
> WHEN foo = 'bar' THEN 1 -- no need for IS NULL check
> WHEN foo IS NULL AND col IS NULL OR foo = col THEN 2
> WHEN foo IS NULL THEN 3 -- no need for equality check
> ELSE 4
> END
> {code}
> {quote}
> *Nice-to-have:* In FE, modify to use {{<=>}} (AKA {{IS NOT DISTINCT}}):
> {code:sql}
> CASE [WHEN expr <=> searchi THEN resulti]+ [ELSE default]? END
> {code}
> Example:
> {code:sql}
> CASE
> WHEN foo <=> 'bar' THEN 1
> WHEN foo <=> col THEN 2
> WHEN foo <=> NULL THEN 3
> ELSE 4
> END
> {code}
> This expansion (and the original one) evaluates the decode expression
> multiple times and would benefit from the optimization mentioned earlier.
> Note also that {{decode()}} can be used to pick out floating-point NaN values:
> {code:sql}
> decode(float_col, sqrt(-1), 0, float_col)
> {code}
> Here, {{sqrt(-1)}} is used to create a NaN value because Impala has no
> {{NaN}} constant or function.
> As it turns out {{decode()}} is a rather special beast because it needs to
> declare n^2 versions for the full set of types. For this reason, we can't add
> it to {{impala_functions.py}} and thus can't move the rewrite rules. We'll
> leave it as the lone remaining rewrite in {{FunctionCallExpr.createExpr()}}.
> *Suggestion:* The current implementation is rather ad-hoc, probably because
> of the unusual nature of the types of the arguments to {{decode()}}. Would be
> cleaner to do the rewrite as rewrite rule rather than as an ad-hoc step when
> creating an expression. That is, rather than doing the rewrite in
> {{FunctionCallExpr.createExpr()}}, do it in
> {{SimplifyConditionalsRule.apply()}}.
> Doing this would allow us to add an entry for {[decode()}} in the
> builtin-functions table. To handle the odd arguments, create a one-off
> {{ScalarFunction}} subclass do to the specialized argument matching.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]