[ 
https://issues.apache.org/jira/browse/CALCITE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825611#comment-16825611
 ] 

Valeriy Trofimov commented on CALCITE-2974:
-------------------------------------------

Sorry for the confusion. I want to give a summary of what’s going on here.

 

What was the issue:
 * My team that uses Calcite for our project. We noticed that Calcite 
RexSimplify.simplifyCast() method was contributing a lot to the query 
optimization latency time.
 * Farther testing revealed that string-to-timestamp conversion performed in 
RexExecutor.reduce() was the slowest. This was due to RexExecutor.reduce() 
function used code generation to perform string-to-timestamp conversion.
 * Furthermore, this string-to-timestamp conversion was performed tens of times 
in a single query optimization process (VolcanoPlanner).
 * The code generation process resulted significant performance hit in the 
following way:
 ** The code generation was very slow due to dynamic compilation.
 ** The dynamic compilation process requires a class-level lock in the 
ClassLoader, which incurred thread contention in highly concurrent environments 
such as our production case.
 ** The code generation and one-time-use object also contributed a lot to the 
GC overhead of JVM.

 

What is the proposal
 * We've realized that it's possible to improve string-to-timestamp conversion 
by direct function calls (instead of code generation)  string-to-timestamp 
conversion.
 * The proposed solution worked as best effort optimization. If we can’t do the 
direct conversion, we will fallback to the existing code flow. So,  there 
should be no behavior change.

  

How much was the improvement (all the results were tested in a single thread):
 * The proposed code showed performance improvements depending how many times 
the string-to-time conversion was performed.

| |Tp50(ns)|Tp90(ns)|Tp99(ns)|
|Code generation string-to-timestamp|1638788|2473212|4087758|
|Direct function call string-to-timestamp|3394|9212|24728|
|improvement ratio|482.8486|268.4772|165.3089|

 
 * For the following query, there are 18 times string-to-timestamp conversion 
performed.
 ** *"select sum(**\"**unit_sales**\"**) as s,**\n**"*
         **         + *" count(**\"**store_sqft**\"**) as c**\n**"*
         **         + *"from* *\"**foodmart**\"\n**"*
         **         + *"where* *\"**timestamp**\"* *>= '1996-01-01 00:00:00 
UTC' and "*
         **         + *"* *\"**timestamp**\"* *< '1998-01-01 00:00:00 
UTC'**\n**"*
         **         + *"group by floor(**\"**timestamp**\"* *to MONTH) order by 
s asc"*;

 
| |Tp50(ns)|Tp90(ns)|Tp99(ns)|
|Code generation string-to-timestamp|41917575|51332363|86035878|
|Direct function call string-to-timestamp|8688970|12422303|28455757|
|improvement ratio|4.824228|4.132274|3.023496|

 
 * For the following query, there are 108 times string-to-timestamp conversion 
performed.
 ** *"select* *\"**state_province**\"**, count(*) as c**\n**"*
         **         + *"from* *\"**foodmart**\"\n**"*
         **         + *"where* *\"**timestamp**\"* *>= '1996-01-01 11:00:00 
UTC' and "*
         **         + *"* *\"**timestamp**\"* *< '1996-01-01 12:00:00 
UTC'**\n**"*
         **         + *"and* *\"**product_name**\"* *= 'High Top Dried 
Mushrooms'**\n**"*
         **         + *"group by* *\"**state_province**\"\n**"*;

 
| |Tp50(ns)|Tp90(ns)|Tp99(ns)|
|Code generation string-to-timestamp|224482909|289397819|384670060|
|Direct function call string-to-timestamp|28631272|37379878|69716849|
|improvement ratio|7.840479773|7.742075001|5.517605364|

 
 * To sum up, our tests showed that for the queries we tested, the direct 
timestamp conversion showed more than 3 times improvements in total query 
optimization latency. We think this is significant and generic enough to 
benefit the community.
 * Answer to Danny's question is that simplify calls make up about 75% of query 
optimization time.

> Timestamp conversion performance can be improved
> ------------------------------------------------
>
>                 Key: CALCITE-2974
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2974
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Valeriy Trofimov
>            Priority: Major
>              Labels: easyfix, performance
>             Fix For: next
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> RexSimplify.simplifyCast() is slow when converting a string to SQL Timestamp 
> value. The slowness is caused by this line:
> {code:java}
> executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> {code}
> Debugging this code line with my team showed that for timestamp conversion it 
> loads a pre-compiled conversion code, which makes it slow. My team proposes 
> to replace this line with the following code that greately improves 
> performance:
> {code:java}
> if (typeName == SqlTypeName.CHAR && e.getType().getSqlTypeName() == 
> SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
>     if (literal.getValue() instanceof NlsString) {
>         String timestampStr = ((NlsString) literal.getValue()).getValue();
>         Long timestampLong = 
> SqlFunctions.toTimestampWithLocalTimeZone(timestampStr);
>         reducedValues.add(rexBuilder.makeLiteral(timestampLong, e.getType(), 
> true));
>     }
> }
> if (reducedValues.isEmpty()) {
>     executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> }
> {code}
> Let us know if we can submit a pull request with this code change or if we've 
> missed anything. Do you know the reason behind using a pre-compiled code for 
> timestamp conversion?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to