26-Jul-2013 23:17, Walter Bright пишет:
On 7/26/2013 5:11 AM, Dmitry Olshansky wrote:
26-Jul-2013 14:47, Dmitry Olshansky пишет:
26-Jul-2013 01:25, Walter Bright пишет:

The slowness was in the frackin' "convert the hash to an index in the
bucket", which is a modulus operation.

Then it's past due to finally stop the madness of modulo prime table and
use a power of 2 size. In essence what modulo prime does is simply
enhancing the quality of your hash function w.r.t. collisions (it helps
to distribute values more evenly).

Any of new decent hashes are good enough to work with plain slice the
lower bits approach.


To be more concrete:
Spooky hash
http://burtleburtle.net/bob/hash/spooky.html (Public domain)
S-box hash
http://home.comcast.net/~bretm/hash/10.html (Published paper)

Or even a good ol' FNV (Public domain)
http://isthe.com/chongo/tech/comp/fnv/#FNV-1a


How about a pull request so we can try it out?

Preliminary pull is here:
https://github.com/D-Programming-Language/dmd/pull/2436

So far it looses a bit. I'm still playing with it, looking at load factor, distribution of sizes, profiling.

So far observations are that SpookyHash is slower then the one that was there thus stealing a few percents of speed. That is then hardly regained by a faster lookup of a slot: almost all of "large" tables were 31 in size and you have special case for that anyway.

What bothers me is that while I've been hacking at this I couldn't shake off the feeling that AA code assumes NO FULL HASH COLLISIONS at all?

Isn't that betting on luck (and a crude hash) a little too much (esp in 32 bit mode)?

That is e.g. in code pasted from aav.c Key is only a hash and there is no way whatsoever to discern a full hash collision.

Value _aaGetRvalue(AA* aa, Key key)
{
    //printf("_aaGetRvalue(key = %p)\n", key);
    if (aa)
    {
        size_t i;
        size_t len = aa->b_length;
        if (len == 4)
            i = (size_t)key & 3;
        else if (len == 31)
            i = (size_t)key % 31;
        else
            i = (size_t)key % len;
        aaA* e = aa->b[i];
        while (e)
        {
            if (key == e->key)
                return e->value;
            e = e->next;
        }
    }
    return NULL;    // not found
}

--
Dmitry Olshansky

Reply via email to