Yes, that is my thinking as well. I looked in Guava for functions related to 
hashing but could only find fancy hashes (e.g. Md5 in 
com.google.common.hash.Hashing). These are only appropriate for long string and 
binary values. So instead I wrote functions such Utilities.hash.

A larger prime would probably give better hashes (e.g. 524287 == 2 ^ 19 - 1) 
and could still be computed using a single shift and subtract.

A few hash functions such as Util.hash(int, int) still use just shift and xor. 
They should definitely be fixed. I have logged 
https://issues.apache.org/jira/browse/CALCITE-1071.

Julian


> On Jan 29, 2016, at 9:26 AM, Ted Dunning <[email protected]> wrote:
> 
> 
> I think that this suggestion is no longer as valid as it once was. 
> 
> - modern compilers can reduce multiplication by special values like 31 to 
> shift and subtract if they think it faster
> 
> - modern integer multiplication is typically as fast as shift either because 
> it just is or because memory bandwidth limits
> 
> - this multiplication is adjacent to a procedure call that will dominate the 
> conversation about speed. Even if inlined, that call to hashcode will be far 
> more expensive than a single multiply. 
> 
> - with big memory in modern machines, speed is often better served by making 
> hashing more expensive (murmur or some such) because hash collisions are far 
> worse than the cost of a tiny bit of computations.  And example is that 
> moving to fancy hashes has made most of the low level collection packages 
> like mahout collections, fastutil or trove faster rather than slower. 
> 
> - changing anything like this should only be done if micro benchmarks show a 
> significant improvement. Check out jmh. 
> 
> Sent from my iPhone
> 
>> On Jan 28, 2016, at 23:06, Albert <[email protected]> wrote:
>> 
>>   I've noticed a lot of the places, calcite codes is using something like
>> this:
>> 
>> result = 31 * result + (body != null ? body.hashCode() : 0);
>> 
>> using multiply in hash code calculation probably isn't best practice.
>> something like shift operator should be more efficient.  since the project
>> is already depending on guava, why not using their hash code utils ?

Reply via email to