[
https://issues.apache.org/jira/browse/CALCITE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818220#comment-16818220
]
Valeriy Trofimov commented on CALCITE-2974:
-------------------------------------------
Doing a quick non-scientific perf measure by running a unit test like this:
{code:java}
@Test public void testCastTimestampJIRA2974() {
// Test cast for date/time/timestamp
tester.setFor(SqlStdOperatorTable.CAST);
tester.checkScalar(
"cast('1945-02-24 12:42:25.34' as TIMESTAMP)",
"1945-02-24 12:42:25",
"TIMESTAMP(0) NOT NULL");
}
{code}
On a RexSimplify.simplifyCast part of the code like this:
{code:java}
// current code:
//executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
// proposed code:
// if converting from CHAR to TIMESTAMP, try to convert manually to improve perf
boolean isFromChar = typeName == SqlTypeName.CHAR;
boolean isToTimeStamp = e.getType().getSqlTypeName() == SqlTypeName.TIMESTAMP;
if (isFromChar) { // if converting from CHAR...
String toConvert = ((NlsString) literal.getValue()).getValue();
if (isToTimeStamp) { // ...to TIMESTAMP
try {
Long converted = DateTimeUtils.timestampStringToUnixDate(toConvert);
reducedValues.add(rexBuilder.makeLiteral(converted, e.getType(), true));
} catch (Exception e2) {
}
}
}
if (reducedValues.isEmpty()) { // if manual conversion failed, use reduce()
method
executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
}
long t2 = java.lang.System.nanoTime();
System.out.println(t2 - t1);
{code}
Shows (t2 - t1) value of 1376990. If you comment out the proposed code and
uncomment the current code you will see the (t2 - t1) value of 323713768. It's
a 235-fold perf difference between current and proposed code. As Danny
mentioned, this is not the only code that gets run during query planning, but
it gets called a lot in our case, so the numbers add up. It's possible that we
have a bug in our code that calls it too many times - we'll look into that. But
we want to make our product as high-perf as possible, so we welcome any perf
improvements like this.
> Timestamp conversion performance can be improved
> ------------------------------------------------
>
> Key: CALCITE-2974
> URL: https://issues.apache.org/jira/browse/CALCITE-2974
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.19.0
> Reporter: Valeriy Trofimov
> Priority: Major
> Labels: easyfix, performance
> Fix For: next
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> RexSimplify.simplifyCast() is slow when converting a string to SQL Timestamp
> value. The slowness is caused by this line:
> {code:java}
> executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> {code}
> Debugging this code line with my team showed that for timestamp conversion it
> loads a pre-compiled conversion code, which makes it slow. My team proposes
> to replace this line with the following code that greately improves
> performance:
> {code:java}
> if (typeName == SqlTypeName.CHAR && e.getType().getSqlTypeName() ==
> SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
> if (literal.getValue() instanceof NlsString) {
> String timestampStr = ((NlsString) literal.getValue()).getValue();
> Long timestampLong =
> SqlFunctions.toTimestampWithLocalTimeZone(timestampStr);
> reducedValues.add(rexBuilder.makeLiteral(timestampLong, e.getType(),
> true));
> }
> }
> if (reducedValues.isEmpty()) {
> executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> }
> {code}
> Let us know if we can submit a pull request with this code change or if we've
> missed anything. Do you know the reason behind using a pre-compiled code for
> timestamp conversion?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)