[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133914#comment-17133914
]
Danny Chen commented on CALCITE-3786:
-------------------------------------
[~vladimirsitnikov] Thanks for the background sharing, if we do the
normalization during planning already, it would be better that we made the
thing more explicit, not an implicit contract there. That is to say, i prefer
to keep sync the plan structure same with the real RexTree, a normalization on
creation seems better. We can have a control flag three just like the Rex
simplification, but the default value should be true.
Personally i still question about the gains of Rex normalization because it
makes the RexNode code complex, and there is an implicit contract there, we
have already made decision that Rex normalization is the way to go, so let's
make the impl better if we can.
[~hyuan] I can thought of another benefit to use Digest, downstream projects
need only implement the #explainTerms instead of additional #hashCode and
#equals, which are both error-prone. As for the additional mem consumption for
object references, assume 100 bytes for a Digest there, and 10000 rel nodes,
the total memory should be 1Mb, which i think is acceptable, so i would choose
a more concise interface, a Digest behind the #explainTerms.
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)