[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode

Danny Chen (Jira) Wed, 17 Jun 2020 06:28:18 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138427#comment-17138427
 ]


Danny Chen commented on CALCITE-3786:
-------------------------------------

Thanks, [~vladimirsitnikov] and [~zabetak] ~ I have addressed the comments ~ [1]

> Can you please clarify what is the number of objects in DigestToRelMap?

They are the rel and all it's input's digest to node mapping, in order to 
simulate the planner node #register(I know there is also RelSubSet node there, 
but the affect should be the same).

I have changed the level from  Level.Invocation to Level.Iteration.

Here is the latest data:

The diff of performance:
{code:java}

Benchmark                               (digestType)  (disjunctions)  (joins)  
Mode  Cnt          Score   Error  Units
DigestBenchmark.getRel                        OBJECT               1        1  
avgt    5          0.123 ± 0.004  us/op
DigestBenchmark.getRel                        OBJECT               1       10  
avgt    5          0.447 ± 0.023  us/op
DigestBenchmark.getRel                        OBJECT               1       20  
avgt    5          0.868 ± 0.085  us/op
DigestBenchmark.getRel                        OBJECT              10        1  
avgt    5          0.126 ± 0.014  us/op
DigestBenchmark.getRel                        OBJECT              10       10  
avgt    5          0.459 ± 0.029  us/op
DigestBenchmark.getRel                        OBJECT              10       20  
avgt    5          0.920 ± 0.147  us/op
DigestBenchmark.getRel                        OBJECT             100        1  
avgt    5          0.119 ± 0.008  us/op
DigestBenchmark.getRel                        OBJECT             100       10  
avgt    5          0.452 ± 0.030  us/op
DigestBenchmark.getRel                        OBJECT             100       20  
avgt    5          0.857 ± 0.109  us/op

DigestBenchmark.getRel                        STRING               1        1  
avgt    5          1.320 ± 0.049  us/op
DigestBenchmark.getRel                        STRING               1       10  
avgt    5         10.588 ± 0.088  us/op
DigestBenchmark.getRel                        STRING               1       20  
avgt    5         27.863 ± 0.320  us/op
DigestBenchmark.getRel                        STRING              10        1  
avgt    5          1.352 ± 0.028  us/op
DigestBenchmark.getRel                        STRING              10       10  
avgt    5         10.612 ± 0.286  us/op
DigestBenchmark.getRel                        STRING              10       20  
avgt    5         27.865 ± 1.627  us/op
DigestBenchmark.getRel                        STRING             100        1  
avgt    5          1.467 ± 0.683  us/op
DigestBenchmark.getRel                        STRING             100       10  
avgt    5         10.738 ± 0.075  us/op
DigestBenchmark.getRel                        STRING             100       20  
avgt    5         28.211 ± 0.449  us/op
{code}

The diff of memory usage:
{code:java}
Benchmark                               (digestType)  (disjunctions)  (joins)  
Mode  Cnt          Score   Error  Units
DigestBenchmark.getRel:Max memory heap        OBJECT               1        1  
avgt    5  228065280.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT               1       10  
avgt    5  211812352.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT               1       20  
avgt    5  215482368.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT              10        1  
avgt    5  239599616.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT              10       10  
avgt    5  218628096.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT              10       20  
avgt    5  257949696.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT             100        1  
avgt    5  258998272.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT             100       10  
avgt    5  211812352.000          bytes
DigestBenchmark.getRel:Max memory heap        OBJECT             100       20  
avgt    5  213385216.000          bytes


DigestBenchmark.getRel:Max memory heap        STRING               1        1  
avgt    5  300417024.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING               1       10  
avgt    5  262144000.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING               1       20  
avgt    5  242745344.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING              10        1  
avgt    5  317194240.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING              10       10  
avgt    5  273154048.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING              10       20  
avgt    5  258473984.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING             100        1  
avgt    5  386924544.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING             100       10  
avgt    5  262144000.000          bytes
DigestBenchmark.getRel:Max memory heap        STRING             100       20  
avgt    5  235405312.000          bytes
{code}

I still use the max heap memory usage because it is the most straight-forward 
metric to illustrate the memory usage.

[1] 
https://github.com/danny0405/calcite/commit/fe7e82cfe9ab124ee6aad929367e09c755d3a967

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode

Reply via email to