== Quote from Steve Teale ([email protected])'s article > Kristian Kilpi Wrote: > > On Sun, 03 May 2009 15:33:12 +0300, J�r�me M. Berger <[email protected]> > > wrote: > > > | > > > | foreach(c; str) > > > | { > > > | ret = (ret << 4) + c; > > > | } > > > | > > > That one is very bad because it only takes into account the last > > > few characters of the string (how many depends on the size of the > > > hash). However, several hashing algorithms use 2^n+1 as their > > > multiplier, which is very fast even on old/small/embedded hardware > > > because it can be implemented as a shift and an addition. > > > > You are of course right; thanks for the correction. Lets scrap that > > algorithm! :) > Would something like > foreach(i, c; str) > { > ret ^= cast(uint) c; > if (i & 1) > ret <<= 1; > } > do a better job. That could cover the first 64 chars reasonably well, and I suspect that could cover a lot of use cases.
I guess a lot of people missed my follow-up post on this. This is largely a typeinfo bug, not a string hashing bug. typeid(string) returns a TypeInfo reference to generic array TypeInfo, whereas typeid(char[]) returns a TypeInfo reference for which the hashing works well. What is needed is for typeid(typemodifier(char)[]) to return the same thing as typeid(char[]). Then again, I guess that, given how bad the generic array hashing is, it wouldn't hurt to improve that, too. If someone implements a decent meta-hashing function (i.e. take a bunch of hashes of elements of some object, like an array, a tuple, or a struct, and produce a single good hash out of these), there would be no shortage of use cases for it.
