[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142117#comment-17142117
]
Danny Chen commented on CALCITE-3786:
-------------------------------------
> each operator has to getFieldList from rowtype and compare, but some
> operators don't want to do so, but they have no control
I would say that the row type equivalence is *a must* for current
transformation, you can not recognize that 2 nodes with different fields number
or types are equivalent. The planner also have a strict check inside actually.
Thus make the row type comparison transient to downstream project nodes reduce
the redundant work. My example code just illustrates that row type for
downstream may not have fields, but they must also obey the rule that the type
should equal(in the version before 1.22, we compare with just row type, but
later we switch to field list which may cause problem, the code fix that).
> Do you think it is better to leave some space for operators to define their
> own logic?
I did, that is why we have different SqlExplainLevel for customization of
different cases, for 90% cases the digest items and toString items are almost
the same, the Enum keeps the customization and also saves the redundant
interfaces. As for other things, i.e. row type. hints, traits, i don't think we
should/need to open customization for them, they are common and required.
Your patch collects the digest items every time we do #equals, and you have to
cache the hashcode for RexNodes, it sacrifice performance for memory, so i
would be hard to say it is better, we need a benchmark to prove that.
BTW, i don't think #getDigest should be a public API, the digest is a planner
internal abstraction, the only interface we should impose is #explainTerms, so
i don't think there is compatibility problem for master.
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Fix For: 1.24.0
>
> Time Spent: 8h 50m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)