[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141240#comment-17141240
 ] 

Haisheng Yuan commented on CALCITE-3786:
----------------------------------------

{quote}
We should be cautious for #equals #hashcode #explainTerms to have similar 
variables, it is error-prone for downstream projects to implement for their 
custom nodes
{quote}
I astonishingly found that you are correct. #equals #hashCode #toString have 
similar variables likewise, e.g. 
[StreamShardMetadata.java|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardMetadata.java#L112]
 in Flink, it is exactly what you said. Should I open a pull request to Flink 
to always use "toString().hashCode" and "toString().equals" or use Digest 
instead? What is worse, 90% of Java classes have exactly same variables for 
these methods. The worst thing is, they are default Java methods for every Java 
object. Damn it! We should propose to JCP members to make Object#hashCode and 
#equals final so that we have a bug-free world.

{quote}
The additional objects references should not be a bottleneck, removing it does 
not gain much.
{quote}
After deep meditation, I finally got your reasoning. Maybe because I still live 
in an era when the RAM is 128 MB. I can't help reflecting myself, am I thinking 
wrong in CALCITE-4049? Initially I agree with Xiening and Julian's criticize 
about our buddy adding 4 bytes to the Cons data structure to serve his purpose. 
I pondered whether we are too nitpicking to him. 10000 ConsList only add 40KB, 
which is also acceptable. Isn't it?

Anyway, I have addressed your kind comments, removed the 
digestEquals/digestHash from operators as you suggested, hope that can ease 
your concern. I have opened a PR for review: 
https://github.com/apache/calcite/pull/2039, which is open for discussion, 
comment and -1. Thanks.

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to