[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133799#comment-17133799
]
Haisheng Yuan commented on CALCITE-3786:
----------------------------------------
{quote}
Even we are with these additional object references, then how much join nodes
during the planning ? They do not expect to be the cause of the OOM. What
causes the OOM is the digest strings, especially for complex RexCalls, before
this patch, their string format digest were always patched up in the digest of
the RelNode, which would be big memory comsuption.
{quote}
I am not saying this proposal causes the OOM, instead it reduces memory usage
comparing with the string digest, as long as we stop using string digest for
RexNode, OOM issue like 3784 will ease a lot. However, the so-called Digest
still waste a lot of memory, unnecessarily. If you think Join node is not that
many, how about Project? Try to compute the Digest size of Project, it may
consume much more memory.
{quote}
Why do you think we should use #hashCode and #equals to decide that two
relational expression are semantically equivalent ? Or i asked it in different
way, if two relational expression are equivalent, should they be equals to each
other ? (For example, 2 projects differs only with field aliases.)
{quote}
Why not? Although this is not mandatory, it is so intuitive and natural. Two
Project with different aliases can be equal with each other, as long as they
produce the same hashcode, we treat them as equal object, in MEMO we can use 1
instance to represent the other one.
{quote}
BTW, even if RexCall are new object, use the equals to compare would solve the
problem.
{quote}
Exactly, that is the problem. It is deep comparison, which may cause
performance issue for large complex queries. Note that different rules can
generate lots of intermediate, equal RelNode and RexNode but different
instances, especially when Project/Filter/Calc merge happens on physical
operator in VolcanoPlanner.
BTW, does anyone know that [VoltDB|https://www.voltdb.com/company/why-voltdb/],
an in-memory OLTP RDBMS, is
[experimenting|https://github.com/VoltDB/voltdb/tree/master/src/frontend/org/voltdb/plannerv2]
on integrating Calcite into its system?
If Calcite wants to be versatile for OLTP, OLAP, Batch, Stream, every bit of
memory counts, and every millisecond counts.
I am casting "-1" on this proposal, the justification has been provided in this
and previous comments. I am just expressing objection on this idea, I don't
have rights to prevent anyone from doing anything, though.
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)