[
https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138427#comment-17138427
]
Danny Chen commented on CALCITE-3786:
-------------------------------------
Thanks, [~vladimirsitnikov] and [~zabetak] ~ I have addressed the comments ~ [1]
> Can you please clarify what is the number of objects in DigestToRelMap?
They are the rel and all it's input's digest to node mapping, in order to
simulate the planner node #register(I know there is also RelSubSet node there,
but the affect should be the same).
I have changed the level from Level.Invocation to Level.Iteration.
Here is the latest data:
The diff of performance:
{code:java}
Benchmark (digestType) (disjunctions) (joins)
Mode Cnt Score Error Units
DigestBenchmark.getRel OBJECT 1 1
avgt 5 0.123 ± 0.004 us/op
DigestBenchmark.getRel OBJECT 1 10
avgt 5 0.447 ± 0.023 us/op
DigestBenchmark.getRel OBJECT 1 20
avgt 5 0.868 ± 0.085 us/op
DigestBenchmark.getRel OBJECT 10 1
avgt 5 0.126 ± 0.014 us/op
DigestBenchmark.getRel OBJECT 10 10
avgt 5 0.459 ± 0.029 us/op
DigestBenchmark.getRel OBJECT 10 20
avgt 5 0.920 ± 0.147 us/op
DigestBenchmark.getRel OBJECT 100 1
avgt 5 0.119 ± 0.008 us/op
DigestBenchmark.getRel OBJECT 100 10
avgt 5 0.452 ± 0.030 us/op
DigestBenchmark.getRel OBJECT 100 20
avgt 5 0.857 ± 0.109 us/op
DigestBenchmark.getRel STRING 1 1
avgt 5 1.320 ± 0.049 us/op
DigestBenchmark.getRel STRING 1 10
avgt 5 10.588 ± 0.088 us/op
DigestBenchmark.getRel STRING 1 20
avgt 5 27.863 ± 0.320 us/op
DigestBenchmark.getRel STRING 10 1
avgt 5 1.352 ± 0.028 us/op
DigestBenchmark.getRel STRING 10 10
avgt 5 10.612 ± 0.286 us/op
DigestBenchmark.getRel STRING 10 20
avgt 5 27.865 ± 1.627 us/op
DigestBenchmark.getRel STRING 100 1
avgt 5 1.467 ± 0.683 us/op
DigestBenchmark.getRel STRING 100 10
avgt 5 10.738 ± 0.075 us/op
DigestBenchmark.getRel STRING 100 20
avgt 5 28.211 ± 0.449 us/op
{code}
The diff of memory usage:
{code:java}
Benchmark (digestType) (disjunctions) (joins)
Mode Cnt Score Error Units
DigestBenchmark.getRel:Max memory heap OBJECT 1 1
avgt 5 228065280.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 1 10
avgt 5 211812352.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 1 20
avgt 5 215482368.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 10 1
avgt 5 239599616.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 10 10
avgt 5 218628096.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 10 20
avgt 5 257949696.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 100 1
avgt 5 258998272.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 100 10
avgt 5 211812352.000 bytes
DigestBenchmark.getRel:Max memory heap OBJECT 100 20
avgt 5 213385216.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 1 1
avgt 5 300417024.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 1 10
avgt 5 262144000.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 1 20
avgt 5 242745344.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 10 1
avgt 5 317194240.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 10 10
avgt 5 273154048.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 10 20
avgt 5 258473984.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 100 1
avgt 5 386924544.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 100 10
avgt 5 262144000.000 bytes
DigestBenchmark.getRel:Max memory heap STRING 100 20
avgt 5 235405312.000 bytes
{code}
I still use the max heap memory usage because it is the most straight-forward
metric to illustrate the memory usage.
[1]
https://github.com/danny0405/calcite/commit/fe7e82cfe9ab124ee6aad929367e09c755d3a967
> Add Digest interface to enable efficient hashCode(equals) for RexNode and
> RelNode
> ---------------------------------------------------------------------------------
>
> Key: CALCITE-3786
> URL: https://issues.apache.org/jira/browse/CALCITE-3786
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Assignee: Danny Chen
> Priority: Major
> Fix For: 1.24.0
>
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> Current digests for RexNode, RelNode, RelType, and similar cases use String
> concatenation.
> It is easy to implement, however, it has drawbacks:
> 1) String objects cannot be reused. For instance, RexCall has operands,
> however, the digest is duplicated. It causes extra memory use and extra CPU
> for string copying
> 2) There's no way to have multiple #toString() methods. RelType might need
> multiple digests: "including field names", "excluding field names".
> A suggested resolution might be behind the lines of
> {code:java}
> class Digest { // immutable
> final int hashCode; // speedup hashCode and equals
> final Object[] contents; // The values are either other Digest objects or
> Strings
> String toString(); // e.g. for debugging purposes
> int compareTo(Digest); // e.g. for debugging purposes.
> }
> {code}
> Note how fields in Kotlin are aligned much better, and it makes it easier to
> read:
> {code:java}
> class Digest { // immutable
> val hashCode: Int // speedup hashCode and equals
> val contents: Array<Any> // The values are either other Digest objects or
> Strings
> fun toString(): String // e.g. for debugging purposes
> fun compareTo(other: Digest): Int // e.g. for debugging purposes.
> }
> {code}
> Then the digest for RexCall could be the bits relevant to RexCall itself +
> digests of the operands (which can be reused as is)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)