Yes, that is my thinking as well. I looked in Guava for functions related to hashing but could only find fancy hashes (e.g. Md5 in com.google.common.hash.Hashing). These are only appropriate for long string and binary values. So instead I wrote functions such Utilities.hash.
A larger prime would probably give better hashes (e.g. 524287 == 2 ^ 19 - 1) and could still be computed using a single shift and subtract. A few hash functions such as Util.hash(int, int) still use just shift and xor. They should definitely be fixed. I have logged https://issues.apache.org/jira/browse/CALCITE-1071. Julian > On Jan 29, 2016, at 9:26 AM, Ted Dunning <[email protected]> wrote: > > > I think that this suggestion is no longer as valid as it once was. > > - modern compilers can reduce multiplication by special values like 31 to > shift and subtract if they think it faster > > - modern integer multiplication is typically as fast as shift either because > it just is or because memory bandwidth limits > > - this multiplication is adjacent to a procedure call that will dominate the > conversation about speed. Even if inlined, that call to hashcode will be far > more expensive than a single multiply. > > - with big memory in modern machines, speed is often better served by making > hashing more expensive (murmur or some such) because hash collisions are far > worse than the cost of a tiny bit of computations. And example is that > moving to fancy hashes has made most of the low level collection packages > like mahout collections, fastutil or trove faster rather than slower. > > - changing anything like this should only be done if micro benchmarks show a > significant improvement. Check out jmh. > > Sent from my iPhone > >> On Jan 28, 2016, at 23:06, Albert <[email protected]> wrote: >> >> I've noticed a lot of the places, calcite codes is using something like >> this: >> >> result = 31 * result + (body != null ? body.hashCode() : 0); >> >> using multiply in hash code calculation probably isn't best practice. >> something like shift operator should be more efficient. since the project >> is already depending on guava, why not using their hash code utils ?
