[ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657532#comment-16657532
 ] 

Paul Rogers edited comment on IMPALA-7655 at 10/20/18 12:17 AM:
----------------------------------------------------------------

This note is a work in progress in which we will track the behavior of each 
conditional function and operator in Impala. It will show which optimizations 
currently exist, which functions/operators are code generated, and provide the 
foundation for working out how to optimize the others, including those in the 
ticket title.

For all expressions, the planner does a check for all-constant expressions 
(such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with 
the result of the expression by using the BE to interpret the partial 
constant-only expression tree.

Note that, in order for the above to work, any function-level rewrites must 
happen _before_ the constant evaluation, else the BE may not know how to 
interpret the function. *TODO:* Verify the order of these steps.

h4. {{CASE ...}}

BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated 
when in the {{WHERE}} clause or in a join.

h4. {{x IS TRUE}}

BE: Cross-compiled from {{ConditionalFunctions::IsTrue}} using a wrapper 
function {{ConditionalFunctions::IsTrueWrapper}}.

*TO DO:* This operator and others in this family: determine if the optimized 
code strips away the overhead of the wrapper and implementation. Else, 
generating code directly for this operator should be relatively easy (or, at 
least as easy as LLVM ever gets.)

h4. {{x IS NOT TRUE}}

BE: Cross-compiled function: {{ConditionalFunctions::IsNotTrue}}, with wrapper.

h4. {{x IS FALSE}}

BE: Cross-compiled function: {{ConditionalFunctions::IsFalse}}, with wrapper.

h4. {{x IS NOT FALSE}}

BE: Cross-compiled function: {{ConditionalFunctions::IsNotFalse}}, with wrapper.

h4. {{ISTRUE(expr)}}

BE: Cross-compiled function: {{ConditionalFunctions::IsTrue}}, with wrapper.

Implies that this function is rewritten to the operator {{IS TRUE}} form or 
visa-versa.

h4. {{NULLIF(expr1, expr2)}}

FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS 
DISTINCT FROM expr2, expr1, NULL)}}

nullif vanishes from the plan after this step. There is no entry for 
{{nullify()}} in {{impala_functions.py}}.

h4. {{NVL2(expr, trueExpr, falseExpr)}}

FE, {{FunctionCallExpr}}:  {{nvl2(expr, trueExpr, falseExpr)}} → {{if(expr 
IS NULL, trueExpr, falseExpr)}}

nvl2 vanishes from the plan after this step. There is no entry for {{nvl2()}} 
in {{impala_functions.py}}.

h4. {{ISNULL(a, b)}}

FE, {{SimplifyConditional}}: Treated same as {{IFNULL(a, b)}}, but is not 
rewritten to this form.

BE: Alias for this method exist in {{impala_functions.py}}.

*Suggestion:* Rewrite to {{IFNULL(a, b)}}, drop from {{impala_functions.py}} to 
make things a bit more tidy.

h4. {{NVL(a, b)}}

FE, {{SimplifyConditional}}: Treated same as {{IFNULL(a, b)}}, but is not 
rewritten to this form.

BE: Alias for this method exist in {{impala_functions.py}}.

*Suggestion:* Rewrite to {{IFNULL(a, b)}}, drop from {{impala_functions.py}} to 
make things a bit more tidy.

h4. {{IFNULL(a, b)}}

FE, {{SimplifyConditional}}:
* {{ifnull(NULL, x)}} → {{x}}
* {{ifnull(a, a)}} → {{a}}

BE: Entry exists in {{impala_functions.py}}.


was (Author: paul.rogers):
This note is a work in progress in which we will track the behavior of each 
conditional function and operator in Impala. It will show which optimizations 
currently exist, which functions/operators are code generated, and provide the 
foundation for working out how to optimize the others, including those in the 
ticket title.

For all expressions, the planner does a check for all-constant expressions 
(such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with 
the result of the expression by using the BE to interpret the partial 
constant-only expression tree.

Note that, in order for the above to work, any function-level rewrites must 
happen _before_ the constant evaluation, else the BE may not know how to 
interpret the function. *TODO:* Verify the order of these steps.

h4. {{CASE ...}}

BE: Code to generate code exists, but is only used when {{CASE}} appears in a 
predicate or join, to when it appears in the {{SELECT}} clause. See IMPALA-4356.

h4. {{NULLIF(expr1, expr2)}}

FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}} → {{if(expr1 IS 
DISTINCT FROM expr2, expr1, NULL)}}

nullif vanishes from the plan after this step. There is no entry for 
{{nullify()}} in {{impala_functions.py}}.

h4. {{NVL2(expr, trueExpr, falseExpr)}}

FE, {{FunctionCallExpr}}:  {{nvl2(expr, trueExpr, falseExpr)}} → {{if(expr 
IS NULL, trueExpr, falseExpr)}}

nvl2 vanishes from the plan after this step. There is no entry for {{nvl2()}} 
in {{impala_functions.py}}.

h4. {{ISNULL(a, b)}}

FE, {{SimplifyConditional}}: Treated same as {{IFNULL(a, b)}}, but is not 
rewritten to this form.

BE: Alias for this method exist in {{impala_functions.py}}.

*Suggestion:* Rewrite to {{IFNULL(a, b)}}, drop from {{impala_functions.py}} to 
make things a bit more tidy.

h4. {{NVL(a, b)}}

FE, {{SimplifyConditional}}: Treated same as {{IFNULL(a, b)}}, but is not 
rewritten to this form.

BE: Alias for this method exist in {{impala_functions.py}}.

*Suggestion:* Rewrite to {{IFNULL(a, b)}}, drop from {{impala_functions.py}} to 
make things a bit more tidy.

h4. {{IFNULL(a, b)}}

FE, {{SimplifyConditional}}:
* {{ifnull(NULL, x)}} → {{x}}
* {{ifnull(a, a)}} → {{a}}

BE: Entry exists in {{impala_functions.py}}.

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-7655
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7655
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000
> +----------------------------------------------------------+
> | count(case when l_orderkey is null then 1 else null end) |
> +----------------------------------------------------------+
> | 0                                                        |
> +----------------------------------------------------------+
> Fetched 1 row(s) in 0.51s
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail                  |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | 01:AGGREGATE | 1      | 44.03ms  | 44.03ms  | 1      | 1          | 25.00 
> KB | 10.00 MB      | FINALIZE                |
> | 00:SCAN HDFS | 1      | 411.57ms | 411.57ms | 59.99M | -1         | 16.61 
> MB | 88.00 MB      | tpch10_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select 
> count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(if(l_orderkey is NULL, 1, NULL)) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:23:07 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000
> +----------------------------------------+
> | count(if(l_orderkey is null, 1, null)) |
> +----------------------------------------+
> | 0                                      |
> +----------------------------------------+
> Fetched 1 row(s) in 1.01s
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail                  |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | 01:AGGREGATE | 1      | 422.07ms | 422.07ms | 1      | 1          | 25.00 
> KB | 10.00 MB      | FINALIZE                |
> | 00:SCAN HDFS | 1      | 511.13ms | 511.13ms | 59.99M | -1         | 16.61 
> MB | 88.00 MB      | tpch10_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> {noformat}
> It turns out that this is because we don't have good codegen support for 
> ConditionalFunction, and just fall back to emitting a call to the interpreted 
> path: 
> https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
> See CaseExpr for an example of much better codegen support: 
> https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to