[
https://issues.apache.org/jira/browse/CALCITE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051944#comment-17051944
]
Liya Fan commented on CALCITE-3836:
-----------------------------------
Hi all, thanks for the fruitful discussion.
I have added a benchmark, and the results seem to show that the change hashCode
method is more performant (score = average time):
Before
Benchmark Mode Cnt Score Error Units
RelNodeBenchmark.getOutwardEdgesBenchmark avgt 5 107.115 ± 4.869 ns/op
After
Benchmark Mode Cnt Score Error Units
RelNodeBenchmark.getOutwardEdgesBenchmark avgt 5 33.700 ± 0.092 ns/op
However, I don't think performance is the only factor to consider. Quality of
the hash code is also important, as it may lead to performance penalty
ultimately. The quality of identity hash code is something that we cannot
control, and may vary from run to run.
And another problem is stableness. Identity hash code introduces randomness
into the program, making the problem hard to reproduce and debug. We have
observed that the results produced by Calcite may vary form run to run, so we
often need to debug the program several times before the probem emerge again.
I think this is one step towards solving this problem.
> The hash codes of RelNodes are unreliable
> -----------------------------------------
>
> Key: CALCITE-3836
> URL: https://issues.apache.org/jira/browse/CALCITE-3836
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Liya Fan
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> For all sub-classes of AbstractRelNode, the {{hashCode}} methods depend on
> {{AbstractRelNode#hashCode}}, because it is declared as final.
> {{AbstractRelNode#hashCode}} depends on {{Object#hashCode}}, which is called
> identify hash code. The details of identity hash code depends on the specific
> JVM implementation. For many JVMs, the implementation is based on the object
> address in the memory. The problem is that, the address of an object may
> change in a JVM, due to GC, memory contraction, etc. So the hash code of an
> object may change, even if the content of the object is not changed (This can
> be confirmed from the JavaDoc of {{Object#hashCode}}).
> This problem may cause severe issues that are hard to diagnose and debug,
> like an object is in the hash table, but cannot be retrieved; duplicate
> objects in the hash map, etc.
> To solve the problem, we compute a hash code solely from the node id. This is
> consistent with the previous semantics, and solves the above problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)