[
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657532#comment-16657532
]
Paul Rogers edited comment on IMPALA-7655 at 10/23/18 9:33 PM:
---------------------------------------------------------------
Work on this ticket required a broad review of conditional functions as
summarized in IMPALA-7747. The notes below focus on the functions covered in
this ticket.
h4. {{ISNULL(a, b)}}
BE: Alias for this method exist in {{impala_functions.py}}, special
implementation in {{conditional-functions.[h|cc]}}.
*Suggestion:* Rewrite as:
{code:sql}
CASE a IS NULL THEN b ELSE a END
{code}
Since {{isnull()}} would vanish from the plan after this transform, remove the
BE implementation. Ensure that the entry in {{impala_functions.py}} remains so
that the function appears in the list of built-in functions.
h4. {{NVL(a, b)}} \\ {{IFNULL(a, b)}}
FE, {{SimplifyConditional}}: Treated same as {{ISNULL(a, b)}}, but is not
rewritten to this form.
BE: Alias for this method exist in {{impala_functions.py}}.
*Suggestion:* Rewrite to {{ISNULL(a, b)}}, to make things a bit more tidy.
h4. {{IF(cond, trueExpr, falseExpr)}}
FE: {{SimplifyConditional}} performs basic simplifications.
BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
function to allow short-circuit argument evaluation.
*Suggestion:* Rewrite in the FE to
{code:sql}
CASE WHEN cond THEN trueExpr ELSE falseExpr END
{code}
{{IF()}} will then vanish from the plan so remove the BE implementation,
leaving the entry in {{impala_functions.py}}.
h4. {{COALESCE(e1, e2, … en)}}
FE: {{SimplifyConditional}} performs basic simplifications.
BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
function to allow short-circuit argument evaluation.
*Suggestion:* Rewrite in the FE to
{noformat}
CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END
{noformat}
When doing so, extend two existing optimizations.
1. Remove not only leading null values, but all null values.
2. Special case not just the last non-null literal, but rather when
encountering the first such value, drop all remaining terms.
{{COLAESCE()}} will then vanish from the plan so remove the BE implementation.
h4. Remove {{conditional-functions.[h|cc]}}
Since the above will remove the the three special conditional functions, remove
{{conditional-functions.[h|cc]}} as well.
was (Author: paul.rogers):
Work on this ticket required a broad review of conditional functions as
summarized in IMPALA-7747. The notes below focus on the functions covered in
this ticket.
h4. {{ISNULL(a, b)}}
BE: Alias for this method exist in {{impala_functions.py}}, special
implementation in {{conditional-functions.[h|cc]}}.
*Suggestion:* Rewrite as:
{code:sql}
CASE a IS NULL THEN b ELSE a END
{code}
Since {{isnull()}} would vanish from the plan after this transform, remove the
BE implementation. Ensure that the entry in {{impala_functions.py}} remains so
that the function appears in the list of built-in functions.
h4. {{IF(cond, trueExpr, falseExpr)}}
FE: {{SimplifyConditional}} performs basic simplifications.
BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
function to allow short-circuit argument evaluation.
*Suggestion:* Rewrite in the FE to
{code:sql}
CASE WHEN cond THEN trueExpr ELSE falseExpr END
{code}
{{IF()}} will then vanish from the plan so remove the BE implementation,
leaving the entry in {{impala_functions.py}}.
h4. {{COALESCE(e1, e2, … en)}}
FE: {{SimplifyConditional}} performs basic simplifications.
BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only
function to allow short-circuit argument evaluation.
*Suggestion:* Rewrite in the FE to
{noformat}
CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END
{noformat}
When doing so, extend two existing optimizations.
1. Remove not only leading null values, but all null values.
2. Special case not just the last non-null literal, but rather when
encountering the first such value, drop all remaining terms.
{{COLAESCE()}} will then vanish from the plan so remove the BE implementation.
h4. Remove {{conditional-functions.[h|cc]}}
Since the above will remove the the three special conditional functions, remove
{{conditional-functions.[h|cc]}} as well.
> Codegen output for conditional functions (if,isnull, coalesce) is very
> suboptimal
> ---------------------------------------------------------------------------------
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Tim Armstrong
> Priority: Major
> Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation
> involving an if() function was very slow, 10x slower than the equivalent
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case
> when l_orderkey is NULL then 1 else NULL end) from
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator:
> http://tarmstrong-box:25000)
> Query progress can be monitored at:
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000
> +----------------------------------------------------------+
> | count(case when l_orderkey is null then 1 else null end) |
> +----------------------------------------------------------+
> | 0 |
> +----------------------------------------------------------+
> Fetched 1 row(s) in 0.51s
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak
> Mem | Est. Peak Mem | Detail |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | 01:AGGREGATE | 1 | 44.03ms | 44.03ms | 1 | 1 | 25.00
> KB | 10.00 MB | FINALIZE |
> | 00:SCAN HDFS | 1 | 411.57ms | 411.57ms | 59.99M | -1 | 16.61
> MB | 88.00 MB | tpch10_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select
> count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(if(l_orderkey is NULL, 1, NULL)) from
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:23:07 (Coordinator:
> http://tarmstrong-box:25000)
> Query progress can be monitored at:
> http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000
> +----------------------------------------+
> | count(if(l_orderkey is null, 1, null)) |
> +----------------------------------------+
> | 0 |
> +----------------------------------------+
> Fetched 1 row(s) in 1.01s
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak
> Mem | Est. Peak Mem | Detail |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | 01:AGGREGATE | 1 | 422.07ms | 422.07ms | 1 | 1 | 25.00
> KB | 10.00 MB | FINALIZE |
> | 00:SCAN HDFS | 1 | 511.13ms | 511.13ms | 59.99M | -1 | 16.61
> MB | 88.00 MB | tpch10_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> {noformat}
> It turns out that this is because we don't have good codegen support for
> ConditionalFunction, and just fall back to emitting a call to the interpreted
> path:
> https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
> See CaseExpr for an example of much better codegen support:
> https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]