[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139268#comment-17139268
]
Danny Chen commented on CALCITE-3786:
-------------------------------------
[~vladimirsitnikov] I used the GC profiler and here is the test data:
{code:xml}
Benchmark (digestType) (disjunctions)
(joins) Mode Cnt Score Error Units
DigestBenchmark.getRel OBJECT 1
1 avgt 5 0.082 ± 0.010 us/op
DigestBenchmark.getRel OBJECT 1
10 avgt 5 0.380 ± 0.025 us/op
DigestBenchmark.getRel OBJECT 1
20 avgt 5 0.732 ± 0.077 us/op
DigestBenchmark.getRel OBJECT 10
1 avgt 5 0.081 ± 0.010 us/op
DigestBenchmark.getRel OBJECT 10
10 avgt 5 0.364 ± 0.022 us/op
DigestBenchmark.getRel OBJECT 10
20 avgt 5 0.697 ± 0.046 us/op
DigestBenchmark.getRel OBJECT 100
1 avgt 5 0.081 ± 0.008 us/op
DigestBenchmark.getRel OBJECT 100
10 avgt 5 0.359 ± 0.025 us/op
DigestBenchmark.getRel OBJECT 100
20 avgt 5 0.726 ± 0.090 us/op
DigestBenchmark.getRel STRING 1
1 avgt 5 1.269 ± 0.035 us/op
DigestBenchmark.getRel STRING 1
10 avgt 5 10.609 ± 0.146 us/op
DigestBenchmark.getRel STRING 1
20 avgt 5 28.708 ± 0.810 us/op
DigestBenchmark.getRel STRING 10
1 avgt 5 1.365 ± 0.073 us/op
DigestBenchmark.getRel STRING 10
10 avgt 5 10.640 ± 0.107 us/op
DigestBenchmark.getRel STRING 10
20 avgt 5 28.171 ± 0.612 us/op
DigestBenchmark.getRel STRING 100
1 avgt 5 1.354 ± 0.083 us/op
DigestBenchmark.getRel STRING 100
10 avgt 5 11.583 ± 5.685 us/op
DigestBenchmark.getRel STRING 100
20 avgt 5 27.828 ± 0.343 us/op
Benchmark (digestType) (disjunctions)
(joins) Mode Cnt Score Error Units
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 1
1 avgt 5 ≈ 10⁻⁴ B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 1
10 avgt 5 0.005 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 1
20 avgt 5 0.020 ± 0.002 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 10
1 avgt 5 0.001 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 10
10 avgt 5 0.006 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 10
20 avgt 5 0.022 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 100
1 avgt 5 0.003 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 100
10 avgt 5 0.018 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm OBJECT 100
20 avgt 5 0.047 ± 0.006 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 1
1 avgt 5 1840.004 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 1
10 avgt 5 8568.145 ± 0.002 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 1
20 avgt 5 16008.839 ± 0.022 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 10
1 avgt 5 1960.009 ± 0.001 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 10
10 avgt 5 8568.180 ± 0.003 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 10
20 avgt 5 16008.913 ± 0.018 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 100
1 avgt 5 1960.054 ± 0.003 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 100
10 avgt 5 8568.577 ± 0.284 B/op
DigestBenchmark.getRel:·gc.alloc.rate.norm STRING 100
20 avgt 5 16009.819 ± 0.024 B/op
{code}
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Fix For: 1.24.0
>
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)