[ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7655:
--------------------------------
    Comment: was deleted

(was: The devil, on this kind of change, is always in the details. Here are 
some that must be resolved

h4. Rule Ordering Dependencies

Prior to this change, several functions were rewritten to use {{if}}, and 
{{if}} was simplified to handle degenerate cases. Now, those functions are 
rewritten to use {{CASE}}. This then leads to a rule-ordering issue. If we 
simplify {{ifNull(null, y)}} to {{CASE WHEN null is not null THEN null ELSE 
y}}, we'd expect this to further simplify to {{y}}.

In unit test path, each simplification rule is applied once. The 
constant-folding rule is applied before the if-rewrite. Thus, if we generate 
{{null is null}}, the constant folding rule won't be applied a second time. 
Instead, the {{ifnull()}} rewrite code must handle these optimizations.

Oddly, outside of the rule rewrite unit tests, rules are applied repeatedly, 
and so the upstream rewrites might be done. This seems a limitation in the 
testing framework.

In general, the conditional function rewrites must handle their own null-based 
simplifications; they can't assume {{CASE}} rewrites will handle them.

h4. Special Aggregate Handling

Another issue is how IMPALA-5125 was implemented: if rewriting an expression 
drops the last aggregate, the fix simply throws away the rewrite. This is fine 
for simplifications, but not for the conditional rewrites. Since there is no BE 
implementation for the conditional functions, we can't revert to the original 
form. This means the the conditional rewrites must be aware of aggregates.

h4. Conditional Rewrites Must Always Be Enabled

Next, {{BETWEEN}} is rewritten as {{IF()}}, but appears to be done *after* the 
conditional function rewrite rules, leaving an {{if()}} call in the final plan.

Turns out that this issue can occur because, in the {{Analyzer}}, the user can 
disable optimization. At present, the conditional rewrite is part of the 
{{SimplifyConditionalsRule}} which is optional. But, the conditional rewrite is 
not optional, it must always be done whether optimizations are on or not.

Further, if optimizations are on, we want to apply constant folding and other 
rules before the conditional rewrite.

As a result, the conditional rewrites need to be pulled out into a new rule. 
And, that rule must always be applied. If optimizations are on, apply the above 
optimizations first. Otherwise, just apply the rewrites directly.

h4. Handling {{NVL2()}} in the Function Registry

Finally, the change creates function registry entries for {{nvl2()}} so that it 
can be rewritten in the same place as the other conditional functions (directly 
to {{CASE}} without first going through {{if()}}). Since the signature of this 
function is n^2 in types, the function entry would be huge (which is probably 
why it was rewritten very early in the parse process originally.) A special 
handling of {{NVL2()}} (and, later, {{DECODE()}}, see IMPALA-7747) is needed. 
The special handling can exploit the fact that {{NVL2()}} is a useful fiction; 
a short-hand for a {{CASE}}, rather than being a real function that must 
enforce stricter rules.

h4. Issues with IS [NOT] DISTINCT FROM Optimizations

See IMPALA-7755.

h4. Incomplete Analysis After Rules Fire

See IMPALA-7754.

h4. Rewrite of the Wrong Order By Expressions

See IMPALA-7753.

h4. Invalid Logic to Reset a SELECT Statement

See IMPALA-7752.)

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-7655
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7655
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Paul Rogers
>            Priority: Major
>              Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000
> +----------------------------------------------------------+
> | count(case when l_orderkey is null then 1 else null end) |
> +----------------------------------------------------------+
> | 0                                                        |
> +----------------------------------------------------------+
> Fetched 1 row(s) in 0.51s
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail                  |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | 01:AGGREGATE | 1      | 44.03ms  | 44.03ms  | 1      | 1          | 25.00 
> KB | 10.00 MB      | FINALIZE                |
> | 00:SCAN HDFS | 1      | 411.57ms | 411.57ms | 59.99M | -1         | 16.61 
> MB | 88.00 MB      | tpch10_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select 
> count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(if(l_orderkey is NULL, 1, NULL)) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:23:07 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000
> +----------------------------------------+
> | count(if(l_orderkey is null, 1, null)) |
> +----------------------------------------+
> | 0                                      |
> +----------------------------------------+
> Fetched 1 row(s) in 1.01s
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail                  |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> | 01:AGGREGATE | 1      | 422.07ms | 422.07ms | 1      | 1          | 25.00 
> KB | 10.00 MB      | FINALIZE                |
> | 00:SCAN HDFS | 1      | 511.13ms | 511.13ms | 59.99M | -1         | 16.61 
> MB | 88.00 MB      | tpch10_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
> {noformat}
> It turns out that this is because we don't have good codegen support for 
> ConditionalFunction, and just fall back to emitting a call to the interpreted 
> path: 
> https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
> See CaseExpr for an example of much better codegen support: 
> https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to