[jira] [Commented] (CALCITE-3836) The hash codes of RelNodes are unreliable

Liya Fan (Jira) Thu, 05 Mar 2020 01:30:27 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051944#comment-17051944
 ]


Liya Fan commented on CALCITE-3836:
-----------------------------------

Hi all, thanks for the fruitful discussion.

I have added a benchmark, and the results seem to show that the change hashCode 
method is more performant (score = average time):

Before
Benchmark                                  Mode  Cnt    Score   Error  Units
RelNodeBenchmark.getOutwardEdgesBenchmark  avgt    5  107.115 ± 4.869  ns/op

After
Benchmark                                  Mode  Cnt   Score   Error  Units
RelNodeBenchmark.getOutwardEdgesBenchmark  avgt    5  33.700 ± 0.092  ns/op

However, I don't think performance is the only factor to consider. Quality of 
the hash code is also important, as it may lead to performance penalty 
ultimately. The quality of identity hash code is something that we cannot 
control, and may vary from run to run. 

And another problem is stableness. Identity hash code introduces randomness 
into the program, making the problem hard to reproduce and debug. We have 
observed that the results produced by Calcite may vary form run to run, so we 
often need to debug the program several times before the probem emerge again. 

I think this is one step towards solving this problem. 



> The hash codes of RelNodes are unreliable
> -----------------------------------------
>
>                 Key: CALCITE-3836
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3836
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Liya Fan
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> For all sub-classes of AbstractRelNode, the {{hashCode}} methods depend on 
> {{AbstractRelNode#hashCode}}, because it is declared as final. 
> {{AbstractRelNode#hashCode}} depends on {{Object#hashCode}}, which is called 
> identify hash code. The details of identity hash code depends on the specific 
> JVM implementation. For many JVMs, the implementation is based on the object 
> address in the memory. The problem is that, the address of an object may 
> change in a JVM, due to GC, memory contraction, etc. So the hash code of an 
> object may change, even if the content of the object is not changed (This can 
> be confirmed from the JavaDoc of {{Object#hashCode}}). 
> This problem may cause severe issues that are hard to diagnose and debug, 
> like an object is in the hash table, but cannot be retrieved; duplicate 
> objects in the hash map, etc. 
> To solve the problem, we compute a hash code solely from the node id. This is 
> consistent with the previous semantics, and solves the above problem. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3836) The hash codes of RelNodes are unreliable

Reply via email to