Looking through Guava source code the default pattern if you have an object whose equals method depends on, say, 3 fields is to call Objects.hashCode(field0, field1, field2). This calls Arrays.hashCode(Object[]), which produces the standard hash code for Java lists.
This is simple and I assume — since Guava has been thoroughly analyzed and tested — performs well. I think we should do the same in Calcite. To be clear, we are not looking for high performance hash functions. I am trying to mitigate against new code getting a bad hash function because the developer could not discern what was the project’s standard practice. Using Objects.hashCode will be good enough unless proven otherwise. Julian > On Jan 30, 2016, at 9:06 AM, Ted Dunning <[email protected]> wrote: > > > The performance of larger prime multipliers seems intuitively better, but it > isn't necessarily so. If a better quick hash step is desired, a round of > murmur would be much better. > > For instance, one of the constants in the murmur hash finalizer is even. > Reading about the weaknesses of murmur 2 is very instructive. The lesson is > that picking numbers from thin air isn't a great idea. > > Knuth also has some very good (if very old and dated) material on the topic. > > The first question is whether this matters. Is a significant chance of > collision actually even material? > > If yes, using a high quality hash like murmur3 should be tested. Especially > if the hash can be cached, the cost is often surprisingly low. > > Sent from my iPhone > >> On Jan 29, 2016, at 12:25, Julian Hyde <[email protected]> wrote: >> >> A larger prime would probably give better hashes (e.g. 524287 == 2 ^ 19 - 1) >> and could still be computed using a single shift and subtract. >> >> A few hash functions such as Util.hash(int, int) still use just shift and >> xor. They should definitely be fixed. I have logged >> https://issues.apache.org/jira/browse/CALCITE-1071.
