[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133799#comment-17133799
 ] 

Haisheng Yuan commented on CALCITE-3786:
----------------------------------------

{quote}

Even we are with these additional object references, then how much join nodes 
during the planning ? They do not expect to be the cause of the OOM. What 
causes the OOM is the digest strings, especially for complex RexCalls, before 
this patch, their string format digest were always patched up in the digest of 
the RelNode, which would be big memory comsuption.

{quote}

I am not saying this proposal causes the OOM, instead it reduces memory usage 
comparing with the string digest, as long as we stop using string digest for 
RexNode, OOM issue like 3784 will ease a lot. However, the so-called Digest 
still waste a lot of memory, unnecessarily. If you think Join node is not that 
many, how about Project? Try to compute the Digest size of Project, it may 
consume much more memory. 



{quote}

Why do you think we should use #hashCode and #equals to decide that two 
relational expression are semantically equivalent ? Or i asked it in different 
way, if two relational expression are equivalent, should they be equals to each 
other ? (For example, 2 projects differs only with field aliases.)

{quote}

Why not? Although this is not mandatory, it is so intuitive and natural. Two 
Project with different aliases can be equal with each other, as long as they 
produce the same hashcode, we treat them as equal object, in MEMO we can use 1 
instance to represent the other one.



{quote}

BTW, even if RexCall are new object, use the equals to compare would solve the 
problem.

{quote}

Exactly, that is the problem. It is deep comparison, which may cause 
performance issue for large complex queries. Note that different rules can 
generate lots of intermediate, equal RelNode and RexNode but different 
instances, especially when Project/Filter/Calc merge happens on physical 
operator in VolcanoPlanner.

BTW, does anyone know that [VoltDB|https://www.voltdb.com/company/why-voltdb/], 
an in-memory OLTP RDBMS, is 
[experimenting|https://github.com/VoltDB/voltdb/tree/master/src/frontend/org/voltdb/plannerv2]
 on integrating Calcite into its system?

If Calcite wants to be versatile for OLTP, OLAP, Batch, Stream, every bit of 
memory counts, and every millisecond counts.

I am casting "-1" on this proposal, the justification has been provided in this 
and previous comments. I am just expressing objection on this idea, I don't 
have rights to prevent anyone from doing anything, though.

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to