[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode

Haisheng Yuan (Jira) Thu, 18 Jun 2020 14:55:22 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140010#comment-17140010
 ]


Haisheng Yuan commented on CALCITE-3786:
----------------------------------------

I think my objection on this proposal might be mis-interpreted, maybe because I 
was not clear on myself. I am objecting it, not because it won't reduce memory 
consumption, or it will make things worse than current master. Quite the 
opposite, I believe with the proposal, memory footprint will be reduced and 
improved, which I already stated in above comments. As long as we stop using 
string digest, things will become much better, without any doubt. However, is 
this the best approach? I don't think so. So no matter what the performance 
result is, my stance on the proposal doesn't change. I am still against it. 
Because this is not the most appropriate way we would adopt.

 [~danny0405] has worked a lot on this issue, and we all appreciate the amount 
of work he has done. He is very smart and diligent, I couldn't do it as good as 
he can on this matter. Even [~laurent] and I casted -1 on it, and people 
requested Danny to revert the change, but since he refused to revert it, let's 
just keep it in master. I believe that Danny believes his change is beneficial 
for all Calcite users, all with good intention. [~danny0405], Laurent made some 
code review comments, to try to help improve your code, even some are minor 
optimization, can't reflect on the benchmark, but definitely won't make your 
code run slower. 

What if we all take a step back, suppose this JIRA is about to improve the 
memory consumption, there are several steps here. We have finished the first 
step, thanks [~danny0405] for taking the initiative to complete the most 
important 1st step. Now let's think the next step. Is there anything that we 
can further improve based the first step? Some viewpoints (or questions) that I 
share with [~laurent] are:
1. Can the field of Digest be optimized to use less memory?
2. Can operators customize their own identity?
3. Can operators customize the string representation by omitting some fields?
4. Can RexNode be further improved?

Here is the the next step draft change I have in my mind based the above 
questions and Laurent's comments. Can someone take a look? [~danny0405] Feel 
free to take from it if you think there are some improvements:
https://github.com/apache/calcite/commit/f970df50eb69389bbc81bd94e0c8f9da033e5e01

The main change is RelDigest.java, AbstractRelNode.java and RelSubset.java.

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode

Reply via email to