[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140010#comment-17140010
]
Haisheng Yuan commented on CALCITE-3786:
----------------------------------------
I think my objection on this proposal might be mis-interpreted, maybe because I
was not clear on myself. I am objecting it, not because it won't reduce memory
consumption, or it will make things worse than current master. Quite the
opposite, I believe with the proposal, memory footprint will be reduced and
improved, which I already stated in above comments. As long as we stop using
string digest, things will become much better, without any doubt. However, is
this the best approach? I don't think so. So no matter what the performance
result is, my stance on the proposal doesn't change. I am still against it.
Because this is not the most appropriate way we would adopt.
[~danny0405] has worked a lot on this issue, and we all appreciate the amount
of work he has done. He is very smart and diligent, I couldn't do it as good as
he can on this matter. Even [~laurent] and I casted -1 on it, and people
requested Danny to revert the change, but since he refused to revert it, let's
just keep it in master. I believe that Danny believes his change is beneficial
for all Calcite users, all with good intention. [~danny0405], Laurent made some
code review comments, to try to help improve your code, even some are minor
optimization, can't reflect on the benchmark, but definitely won't make your
code run slower.
What if we all take a step back, suppose this JIRA is about to improve the
memory consumption, there are several steps here. We have finished the first
step, thanks [~danny0405] for taking the initiative to complete the most
important 1st step. Now let's think the next step. Is there anything that we
can further improve based the first step? Some viewpoints (or questions) that I
share with [~laurent] are:
1. Can the field of Digest be optimized to use less memory?
2. Can operators customize their own identity?
3. Can operators customize the string representation by omitting some fields?
4. Can RexNode be further improved?
Here is the the next step draft change I have in my mind based the above
questions and Laurent's comments. Can someone take a look? [~danny0405] Feel
free to take from it if you think there are some improvements:
https://github.com/apache/calcite/commit/f970df50eb69389bbc81bd94e0c8f9da033e5e01
The main change is RelDigest.java, AbstractRelNode.java and RelSubset.java.
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Fix For: 1.24.0
>
> Time Spent: 6h 20m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)