Looking through Guava source code the default pattern if you have an object 
whose equals method depends on, say, 3 fields is to call 
Objects.hashCode(field0, field1, field2). This calls Arrays.hashCode(Object[]), 
which produces the standard hash code for Java lists.

This is simple and I assume — since Guava has been thoroughly analyzed and 
tested — performs well. I think we should do the same in Calcite.

To be clear, we are not looking for high performance hash functions. I am 
trying to mitigate against new code getting a bad hash function because the 
developer could not discern what was the project’s standard practice. Using 
Objects.hashCode will be good enough unless proven otherwise.

Julian


> On Jan 30, 2016, at 9:06 AM, Ted Dunning <[email protected]> wrote:
> 
> 
> The performance of larger prime multipliers seems intuitively better, but it 
> isn't necessarily so.  If a better quick hash step is desired, a round of 
> murmur would be much better. 
> 
> For instance, one of the constants in the murmur hash finalizer is even. 
> Reading about the weaknesses of murmur 2 is very instructive. The lesson is 
> that picking numbers from thin air isn't a great idea. 
> 
> Knuth also has some very good (if very old and dated) material on the topic. 
> 
> The first question is whether this matters. Is a significant chance of 
> collision actually even material?
> 
> If yes, using a high quality hash like murmur3 should be tested.  Especially 
> if the hash can be cached, the cost is often surprisingly low. 
> 
> Sent from my iPhone
> 
>> On Jan 29, 2016, at 12:25, Julian Hyde <[email protected]> wrote:
>> 
>> A larger prime would probably give better hashes (e.g. 524287 == 2 ^ 19 - 1) 
>> and could still be computed using a single shift and subtract.
>> 
>> A few hash functions such as Util.hash(int, int) still use just shift and 
>> xor. They should definitely be fixed. I have logged 
>> https://issues.apache.org/jira/browse/CALCITE-1071.

Reply via email to