[ 
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138289#comment-17138289
 ] 

Danny Chen commented on CALCITE-3786:
-------------------------------------

Hi, [~vladimirsitnikov] ~

I write a benchmark there [1] to compare the performance and memory usage diff 
between the pure string digest and the new Digest structure.


{code:xml}
Benchmark                                                 (isStringDigest)  
(joins)  (whereClauseDisjunctions)  Mode  Cnt          Score   Error  Units
DigestBenchmark.getRelFromDigestToRelMap                             false      
  1                          1  avgt    5          0.113 ± 0.009  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
  1                          1  avgt    5  376963072.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
  1                         10  avgt    5          0.146 ± 0.029  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
  1                         10  avgt    5  346554368.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
  1                        100  avgt    5          0.138 ± 0.014  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
  1                        100  avgt    5  348127232.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
 10                          1  avgt    5          0.452 ± 0.041  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
 10                          1  avgt    5  397934592.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
 10                         10  avgt    5          0.450 ± 0.050  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
 10                         10  avgt    5  383254528.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
 10                        100  avgt    5          0.452 ± 0.085  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
 10                        100  avgt    5  353894400.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
 20                          1  avgt    5          0.819 ± 0.239  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
 20                          1  avgt    5  327155712.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
 20                         10  avgt    5          0.814 ± 0.123  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
 20                         10  avgt    5  427819008.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                             false      
 20                        100  avgt    5          0.844 ± 0.218  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap             false      
 20                        100  avgt    5  366477312.000          bytes
{code}


{code:xml}
Benchmark                                                 (isStringDigest)  
(joins)  (whereClauseDisjunctions)  Mode  Cnt          Score   Error  Units
DigestBenchmark.getRelFromDigestToRelMap                              true      
  1                          1  avgt    5          1.797 ± 0.218  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
  1                          1  avgt    5  412090368.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
  1                         10  avgt    5          1.824 ± 0.147  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
  1                         10  avgt    5  405274624.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
  1                        100  avgt    5          2.109 ± 0.453  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
  1                        100  avgt    5  402653184.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
 10                          1  avgt    5         12.118 ± 0.113  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
 10                          1  avgt    5  346030080.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
 10                         10  avgt    5         12.231 ± 0.807  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
 10                         10  avgt    5  438304768.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
 10                        100  avgt    5         12.102 ± 0.243  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
 10                        100  avgt    5  412090368.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
 20                          1  avgt    5         31.184 ± 0.347  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
 20                          1  avgt    5  357564416.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
 20                         10  avgt    5         32.900 ± 1.832  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
 20                         10  avgt    5  322437120.000          bytes
DigestBenchmark.getRelFromDigestToRelMap                              true      
 20                        100  avgt    5         32.072 ± 1.185  us/op
DigestBenchmark.getRelFromDigestToRelMap:Max memory heap              true      
 20                        100  avgt    5  309329920.000          bytes
{code}


In order to reduce the disturbing factors, i ran the old and new in 2 JVMs, the 
results show that there is an impressive improvement(20x) for performance,
for the memory usage, when the join nodes was less than 10, there are about 10% 
promotion, but when the join nodes was 20, the data has some floating,

I used the max used heap mem as the metric, is there better way to compare the 
memory there ?

[1] 
https://github.com/danny0405/calcite/commit/848bafba39bee0de8399a5906885d0960b33397d

> Add Digest interface to enable efficient hashCode(equals) for RexNode and 
> RelNode
> ---------------------------------------------------------------------------------
>
>                 Key: CALCITE-3786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3786
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Assignee: Danny Chen
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String 
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands, 
> however, the digest is duplicated. It causes extra memory use and extra CPU 
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need 
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
>   final int hashCode; // speedup hashCode and equals
>   final Object[] contents; // The values are either other Digest objects or 
> Strings
>   String toString(); // e.g. for debugging purposes
>   int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to 
> read:
> {code:java}
> class Digest { // immutable
>   val hashCode: Int // speedup hashCode and equals
>   val contents: Array<Any> // The values are either other Digest objects or 
> Strings
>   fun toString(): String // e.g. for debugging purposes
>   fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself + 
> digests of the operands (which can be reused as is)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to