[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode

Vladimir Sitnikov (Jira) Wed, 17 Jun 2020 03:01:21 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138302#comment-17138302
 ]


Vladimir Sitnikov commented on CALCITE-3786:
--------------------------------------------

{quote}I used the max used heap mem as the metric, is there better way to 
compare the memory there ?{quote}

1) gc.alloc.rate.norm  -- that is the metric for "allocation rate per benchmark 
execution"
2) The memory consumption (e.g. the number of bytes a rel consumes in heap) is 
to be measured with Eclipse Memory Analyzer or with 
https://openjdk.java.net/projects/code-tools/jol/

{quote} is an impressive improvement(20x) for performance{quote}

Can you please clarify what is the number of objects in {{DigestToRelMap}}?

What is the purpose of StringDigest? I guess it should not impact much, but it 
looks excessive, and it was not there in the old implementation.

The benchmark might suffer from dead code elimination, please follow 
http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.java

Please add a test case when the rel is built dynamically.
With calcite, we often build a relation, and then it is queried in the cache. 
It looks like the current benchmark measures use the same instances for the 
initial data population and for the measurements.


Please prefer shorter benchmark names and benchmark parameter names.
The current result table is very wide, and it is hard to follow.

For instance: isStringDigest could be refactored into

{code:java}
enum DigestType { STRING, OBJECT }
@Param DigestType digest;
{code}


> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode

Reply via email to