[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142199#comment-17142199
 ] 

Haisheng Yuan commented on CALCITE-3786:
----------------------------------------

{quote}My example code just illustrates that row type for downstream may not 
have fields, but they must also obey the rule that the type should equal(in the 
version before 1.22, we compare with just row type, but later we switch to 
field list which may cause problem, the code fix that).
{quote}
So the only thing that downstream can do is wait. Wait your fix to be released 
with 1.24, because they have no choice to override the default behavior in 
their own operator, do they?
 Anyway, the digestEquals and digestHash has been changed to private completely 
per your request, should not be a problem that concerns you.
{quote}you have to cache the hashcode for RexNodes, it sacrifice performance 
for memory.
{quote}
I can change it to only cache hashcode to RexCall if you think it is wasting 
memory. Do you know that RexNode can be shared between operators, especially 
when logical operator is transformed to physical operator. Do you really want 
to compute hashCode for each RexNode again even they are shared?
{quote}i don't think #getDigest should be a public API, the digest is a planner 
internal abstraction
{quote}
I 100% agree with you. I don't think it should be a public API either. But it 
is already a public API, no matter you like it or not, since the first day.

Dude, don't be pissed off. You have reviewed my code multiple times, helped 
improve my PR and remove bunch of ugly codes and unnecessary subclasses. It is 
also a part of your work, isn't it? Look, I didn't revert your code at all, I 
just move your code from one place to another place, e.g. from Digest to 
AbstractRelNode, don't you agree? It is just a tiny refactoring on top of your 
contribution. Some of the change is just like I collect the money that you have 
left on the table. For example:
 - The RexCall toString() still stores string digest. When the toString is 
called, we still can't avoid OOM because of the digest there.
 - The Janino metadata code generator still calls RexNode#toString, it will use 
the large RexCall string as the key, until the metadata is cleared.

Do you really want to do benchmark test competition? Especially for memory 
consumption? It is not fair to you. You have done tons of work, made the money, 
but forgot to take the money. I am just helping you take the money that is left 
on the table by you.

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to