[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142038#comment-17142038
 ] 

Haisheng Yuan commented on CALCITE-3786:
----------------------------------------

[~danny0405] Thank you for replying and reviewing. You are correct that using 
DIGEST_ATTRIBUTE can avoid plan diffs. However, do you think it is better that 
each operator can control how they define their equivalence. Like I said in 
previous example, each operator has to getFieldList from rowtype and compare, 
but some operators don't want to do so, but they have no control... You already 
have an example in [your 
code|https://github.com/apache/calcite/commit/69f25863f5f4197c17927a39a82cbf1cffd12b80#diff-cb90b503c85180c9be40a4984f8a1a54R128].
 We can't know all the different requirements before-hands, right? Do you think 
it is better to leave some space for operators to define their own logic?

Anyway, to ease your concern, now digestEquals and digestHash are private, and 
there is only 1 RelDigest subclass. There are so many inner class in Calcite, 
we can't remove all the inner classes, right? And we'd better keep backward 
compatibility of {{getDigest}} if we can keep it easily, do you agree?

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to