[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132944#comment-17132944
 ] 

Danny Chen commented on CALCITE-3786:
-------------------------------------

bq. Let's use Join operator as an example

Even we are with these additional object references, then how much join nodes 
during the planning ? They do not expect to be the cause of the OOM. What 
causes the OOM is the digest strings, especially for complex RexCalls, before 
this patch, their string format digest were always patched up in the digest of 
the RelNode, which would be big memory comsuption.

bq. Do we really need a tool to unify the logic?

Why do you think we should use #hashCode and #equals to decide that two 
relational expression are semantically equivalent ? Or i asked it in different 
way, if two relational expression are equivalent, should they be equals to each 
other ? (For example, 2 projects differs only with field aliases.)

bq. Did you try to debug it? It is true that when you copy the RexCall it just 
pass the reference, but after column pruning or project transpose, the RexCall 
might be a complete new object,

No, if you have an example, can you share so we can see how much memory the 
planning would use. BTW, even if RexCall are new object, use the equals to 
compare would solve the problem.

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to