Re: Absolutely horrible default string hashing

Andrei Alexandrescu Sun, 03 May 2009 12:00:12 -0700

dsimcha wrote:

== Quote from Steve Teale ([email protected])'s article

Kristian Kilpi Wrote:

On Sun, 03 May 2009 15:33:12 +0300, J�r�me M. Berger <[email protected]>
wrote:

|
| foreach(c; str)
| {
|     ret = (ret << 4) + c;
| }
|
        That one is very bad because it only takes into account the last
few characters of the string (how many depends on the size of the
hash). However, several hashing algorithms use 2^n+1 as their
multiplier, which is very fast even on old/small/embedded hardware
because it can be implemented as a shift and an addition.

You are of course right; thanks for the correction. Lets scrap that
algorithm! :)

Would something like
foreach(i, c; str)
{
   ret ^= cast(uint) c;
   if (i & 1)
      ret <<= 1;
}
do a better job. That could cover the first 64 chars reasonably well, and I

suspect that could cover a lot of use cases.


I guess a lot of people missed my follow-up post on this.

Yes. Such a minor oversight in a face-to-face conversation takes fiveseconds to fix. In a newsgroup, it becomes a long thread :o).

This is largely a
typeinfo bug, not a string hashing bug.  typeid(string) returns a TypeInfo
reference to generic array TypeInfo, whereas typeid(char[]) returns a TypeInfo
reference for which the hashing works well.  What is needed is for
typeid(typemodifier(char)[]) to return the same thing as typeid(char[]).

Then again, I guess that, given how bad the generic array hashing is, it 
wouldn't
hurt to improve that, too.  If someone implements a decent meta-hashing function
(i.e. take a bunch of hashes of elements of some object, like an array, a tuple,
or a struct, and produce a single good hash out of these), there would be no
shortage of use cases for it.

I have exchanged email with Paul Hsieh about using his hash function inPhobos (azillionmonkeys.com) and he said his license is very permissive.I seem to recall I have introduced his hash function, with credit, inPhobos' old runtime but I am not sure whether Sean took that over. Isuggest we do that in druntime for string, be they mutable or not.



Andrei

Re: Absolutely horrible default string hashing

Reply via email to