[
https://issues.apache.org/jira/browse/IMPALA-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949084#comment-16949084
]
徐洲 commented on IMPALA-9035:
----------------------------
It's my pleasure. I'll submit my patch to impala.
> Simplify cast string to timestamp
> ---------------------------------
>
> Key: IMPALA-9035
> URL: https://issues.apache.org/jira/browse/IMPALA-9035
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: 徐洲
> Priority: Critical
>
> Some BI tools generate compatible sqls to cast string to timestamp:
> cast(unix_timestamp('timestr', 'fmt') as timestamp) or
> cast(unix_timestamp('timestr') as timestamp)
> The internal type change in impala:
> StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal
> In earlier 2.x versions, casting TimestampValue to BigIntVal using libc
> functions which is more expensive than current design with CCTZ.
> However, the cast to BIgIntVal seems to be redundant. We can simplify
> cast(unix_timestamp('timestr', 'fmt') as timestamp) to
> to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as
> timestamp) to cast('timestr' as timestamp).
> I managed to modify the fe to support such expr rewriting.
> Benefiting from the rewriting, the query time cost is reduced from 2.52s to
> 2.02s.
> Here is the sql:
> SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM
> `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`)
> WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >=
> cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS
> TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY
> `t2`.`phy_category1_name` order by d0;
> Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0`
> FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` =
> `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >=
> cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS
> TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY
> `t2`.`phy_category1_name` order by d0
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]