[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143173#comment-17143173
 ] 

Haisheng Yuan commented on CALCITE-3786:
----------------------------------------

OK, since you desire it so much, let me show you some graphs. The benchmark 
test hardly tell anything useful and match the real use case, let's run tpch 
query to see how it goes. Let's use {{TpchTest.testQuery05}}, which is slow 
enough to sample enough data for these methods.

1. Master before your change 
([978bb7|https://github.com/apache/calcite/commit/978bb7ea44969351468d1b5e240e8f57af7e5770])
 !screenshot-1.png|width=895,height=299! 
 !screenshot-2.png|width=850,height=364!

2. Your patch with part 2 
([b18470|https://github.com/apache/calcite/commit/b18470797eaec8c4e64382c0b742ae83f758275f])
 !screenshot-3.png|width=869,height=456! 
 !screenshot-4.png|width=851,height=407!

3. My patch committed to master that is 
[reverted|https://github.com/apache/calcite/commit/a551d4bfa3b97c42e74fe4e30ca66ba493719c96]
 by you 
([b00c1f|https://github.com/apache/calcite/commit/b00c1fd6e26b5e3cc1a810c9f862b184649cd1a6])
 !screenshot-5.png|width=861,height=299!

4. My patch that make digestEquals and digestHash private per your request 
([173624|https://github.com/apache/calcite/commit/1736242612634359164be483334884cb9113f804])
 !image-2020-06-23-12-47-25-599.png|width=869,height=307!
 Note that recomputeDigest in my patches costs nothing, it just clears the hash 
to 0.

It turned out my initial patch that got +1 from 2 committers, yet you said 
which is a mess, far from +1, then reverted, yields the best performance and 
use the least memory. Even my last patch with private digestEquals after your 
review, still perform better than yours and master with string digest.

Now do you think it is worth allowing digestEquals and digestHash to be 
protected instead of private, so that if downstream projects think the default 
digestEquals and digestHash are performance bottleneck (it doesn't seem to be 
for this TPCH query), they can override it like 
[this|https://github.com/apache/calcite/commit/b00c1fd6e26b5e3cc1a810c9f862b184649cd1a6#diff-16c976c118188d143e34105a8c79f235],
 or they can choose to override the default behavior in case the downstream 
data type doesn't have fields in CALCITE-3793 that the default methods can't 
handle appropriately?

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>         Attachments: image-2020-06-23-12-47-25-599.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, 
> screenshot-6.png
>
>          Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to