[
https://issues.apache.org/jira/browse/CALCITE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825611#comment-16825611
]
Valeriy Trofimov commented on CALCITE-2974:
-------------------------------------------
Sorry for the confusion. I want to give a summary of what’s going on here.
What was the issue:
* My team that uses Calcite for our project. We noticed that Calcite
RexSimplify.simplifyCast() method was contributing a lot to the query
optimization latency time.
* Farther testing revealed that string-to-timestamp conversion performed in
RexExecutor.reduce() was the slowest. This was due to RexExecutor.reduce()
function used code generation to perform string-to-timestamp conversion.
* Furthermore, this string-to-timestamp conversion was performed tens of times
in a single query optimization process (VolcanoPlanner).
* The code generation process resulted significant performance hit in the
following way:
** The code generation was very slow due to dynamic compilation.
** The dynamic compilation process requires a class-level lock in the
ClassLoader, which incurred thread contention in highly concurrent environments
such as our production case.
** The code generation and one-time-use object also contributed a lot to the
GC overhead of JVM.
What is the proposal
* We've realized that it's possible to improve string-to-timestamp conversion
by direct function calls (instead of code generation) string-to-timestamp
conversion.
* The proposed solution worked as best effort optimization. If we can’t do the
direct conversion, we will fallback to the existing code flow. So, there
should be no behavior change.
How much was the improvement (all the results were tested in a single thread):
* The proposed code showed performance improvements depending how many times
the string-to-time conversion was performed.
| |Tp50(ns)|Tp90(ns)|Tp99(ns)|
|Code generation string-to-timestamp|1638788|2473212|4087758|
|Direct function call string-to-timestamp|3394|9212|24728|
|improvement ratio|482.8486|268.4772|165.3089|
* For the following query, there are 18 times string-to-timestamp conversion
performed.
** *"select sum(**\"**unit_sales**\"**) as s,**\n**"*
** + *" count(**\"**store_sqft**\"**) as c**\n**"*
** + *"from* *\"**foodmart**\"\n**"*
** + *"where* *\"**timestamp**\"* *>= '1996-01-01 00:00:00
UTC' and "*
** + *"* *\"**timestamp**\"* *< '1998-01-01 00:00:00
UTC'**\n**"*
** + *"group by floor(**\"**timestamp**\"* *to MONTH) order by
s asc"*;
| |Tp50(ns)|Tp90(ns)|Tp99(ns)|
|Code generation string-to-timestamp|41917575|51332363|86035878|
|Direct function call string-to-timestamp|8688970|12422303|28455757|
|improvement ratio|4.824228|4.132274|3.023496|
* For the following query, there are 108 times string-to-timestamp conversion
performed.
** *"select* *\"**state_province**\"**, count(*) as c**\n**"*
** + *"from* *\"**foodmart**\"\n**"*
** + *"where* *\"**timestamp**\"* *>= '1996-01-01 11:00:00
UTC' and "*
** + *"* *\"**timestamp**\"* *< '1996-01-01 12:00:00
UTC'**\n**"*
** + *"and* *\"**product_name**\"* *= 'High Top Dried
Mushrooms'**\n**"*
** + *"group by* *\"**state_province**\"\n**"*;
| |Tp50(ns)|Tp90(ns)|Tp99(ns)|
|Code generation string-to-timestamp|224482909|289397819|384670060|
|Direct function call string-to-timestamp|28631272|37379878|69716849|
|improvement ratio|7.840479773|7.742075001|5.517605364|
* To sum up, our tests showed that for the queries we tested, the direct
timestamp conversion showed more than 3 times improvements in total query
optimization latency. We think this is significant and generic enough to
benefit the community.
* Answer to Danny's question is that simplify calls make up about 75% of query
optimization time.
> Timestamp conversion performance can be improved
> ------------------------------------------------
>
> Key: CALCITE-2974
> URL: https://issues.apache.org/jira/browse/CALCITE-2974
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.19.0
> Reporter: Valeriy Trofimov
> Priority: Major
> Labels: easyfix, performance
> Fix For: next
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> RexSimplify.simplifyCast() is slow when converting a string to SQL Timestamp
> value. The slowness is caused by this line:
> {code:java}
> executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> {code}
> Debugging this code line with my team showed that for timestamp conversion it
> loads a pre-compiled conversion code, which makes it slow. My team proposes
> to replace this line with the following code that greately improves
> performance:
> {code:java}
> if (typeName == SqlTypeName.CHAR && e.getType().getSqlTypeName() ==
> SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
> if (literal.getValue() instanceof NlsString) {
> String timestampStr = ((NlsString) literal.getValue()).getValue();
> Long timestampLong =
> SqlFunctions.toTimestampWithLocalTimeZone(timestampStr);
> reducedValues.add(rexBuilder.makeLiteral(timestampLong, e.getType(),
> true));
> }
> }
> if (reducedValues.isEmpty()) {
> executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> }
> {code}
> Let us know if we can submit a pull request with this code change or if we've
> missed anything. Do you know the reason behind using a pre-compiled code for
> timestamp conversion?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)