[
https://issues.apache.org/jira/browse/IMPALA-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145956#comment-17145956
]
ASF subversion and git services commented on IMPALA-7020:
---------------------------------------------------------
Commit 62729980d9b0b458f8eeff506ce4f2456b16dfc1 in impala's branch
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6272998 ]
IMPALA-7020: fix costing of non-trivial CAST expressions
Some cast operations are quite expensive to evaluate,
which was not reflected in the uniform costing of CAST
expresions.
We fix this by increasing the cost of non-trivial
casts to be the same as an arbitrary function call.
Testing:
Ran exhaustive tests.
Add planner tests to check that CAST expressions are
materialized or not based on the input and output
types - the planner output lists 'materialized:'
expressions for the SORT operator.
A few existing planner tests had changes in predicate
ordering. I checked manually that these changes made
sense.
Perf:
I sanity-checked that this actually helped (a variant of)
the example query from IMPALA-7020. The following query
went from ~8s to ~2s in my dev environment:
select *
FROM
(
SELECT
o.*,
ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
FROM
(
SELECT
l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as
date) evt_ts
FROM
tpch_parquet.lineitem
) o
) r
WHERE
rn BETWEEN 1 AND 101
ORDER BY rn;
Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Reviewed-on: http://gerrit.cloudera.org:8080/16073
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Order by expressions in Analytical functions are not materialized causing
> slowdown
> ----------------------------------------------------------------------------------
>
> Key: IMPALA-7020
> URL: https://issues.apache.org/jira/browse/IMPALA-7020
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 2.12.0
> Reporter: Mostafa Mokhtar
> Assignee: Tim Armstrong
> Priority: Major
> Labels: performance
> Attachments: Slow case profile.txt, Workaround profile.txt
>
>
> Order by expressions in Analytical functions are not materialized and cause
> queries to run much slower.
> The rewrite for the query below is 20x faster, profiles attached.
> Repro
> {code}
> select *
> FROM
> (
> SELECT
> o.*,
> ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
> FROM
> (
> SELECT
> l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as
> string) evt_ts
> FROM
> lineitem
> WHERE
> l_shipdate BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00'
> ) o
> ) r
> WHERE
> rn BETWEEN 1 AND 101
> ORDER BY rn;
> {code}
> Workaround
> {code}
> select *
> FROM
> (
> SELECT
> o.*,
> ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
> FROM
> (
> SELECT
> l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as
> string) evt_ts
> FROM
> lineitem
> WHERE
> l_shipdate BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00'
> union all
> SELECT
> l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as
> string) evt_ts
> FROM
> lineitem limit 0
>
> ) o
> ) r
> WHERE
> rn BETWEEN 1 AND 101
> ORDER BY rn;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]