[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141240#comment-17141240
]
Haisheng Yuan commented on CALCITE-3786:
----------------------------------------
{quote}
We should be cautious for #equals #hashcode #explainTerms to have similar
variables, it is error-prone for downstream projects to implement for their
custom nodes
{quote}
I astonishingly found that you are correct. #equals #hashCode #toString have
similar variables likewise, e.g.
[StreamShardMetadata.java|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardMetadata.java#L112]
in Flink, it is exactly what you said. Should I open a pull request to Flink
to always use "toString().hashCode" and "toString().equals" or use Digest
instead? What is worse, 90% of Java classes have exactly same variables for
these methods. The worst thing is, they are default Java methods for every Java
object. Damn it! We should propose to JCP members to make Object#hashCode and
#equals final so that we have a bug-free world.
{quote}
The additional objects references should not be a bottleneck, removing it does
not gain much.
{quote}
After deep meditation, I finally got your reasoning. Maybe because I still live
in an era when the RAM is 128 MB. I can't help reflecting myself, am I thinking
wrong in CALCITE-4049? Initially I agree with Xiening and Julian's criticize
about our buddy adding 4 bytes to the Cons data structure to serve his purpose.
I pondered whether we are too nitpicking to him. 10000 ConsList only add 40KB,
which is also acceptable. Isn't it?
Anyway, I have addressed your kind comments, removed the
digestEquals/digestHash from operators as you suggested, hope that can ease
your concern. I have opened a PR for review:
https://github.com/apache/calcite/pull/2039, which is open for discussion,
comment and -1. Thanks.
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Fix For: 1.24.0
>
> Time Spent: 8h 40m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)