26-Jul-2013 23:17, Walter Bright пишет:
On 7/26/2013 5:11 AM, Dmitry Olshansky wrote:
26-Jul-2013 14:47, Dmitry Olshansky пишет:
26-Jul-2013 01:25, Walter Bright пишет:
The slowness was in the frackin' "convert the hash to an index in the
bucket", which is a modulus operation.
Then it's past due to finally stop the madness of modulo prime table and
use a power of 2 size. In essence what modulo prime does is simply
enhancing the quality of your hash function w.r.t. collisions (it helps
to distribute values more evenly).
Any of new decent hashes are good enough to work with plain slice the
lower bits approach.
To be more concrete:
http://burtleburtle.net/bob/hash/spooky.html (Public domain)
http://home.comcast.net/~bretm/hash/10.html (Published paper)
Or even a good ol' FNV (Public domain)
How about a pull request so we can try it out?
Preliminary pull is here:
So far it looses a bit. I'm still playing with it, looking at load
factor, distribution of sizes, profiling.
So far observations are that SpookyHash is slower then the one that was
there thus stealing a few percents of speed. That is then hardly
regained by a faster lookup of a slot:
almost all of "large" tables were 31 in size and you have special case
for that anyway.
What bothers me is that while I've been hacking at this I couldn't shake
off the feeling that AA code assumes NO FULL HASH COLLISIONS at all?
Isn't that betting on luck (and a crude hash) a little too much (esp in
32 bit mode)?
That is e.g. in code pasted from aav.c Key is only a hash and there is
no way whatsoever to discern a full hash collision.
Value _aaGetRvalue(AA* aa, Key key)
//printf("_aaGetRvalue(key = %p)\n", key);
size_t len = aa->b_length;
if (len == 4)
i = (size_t)key & 3;
else if (len == 31)
i = (size_t)key % 31;
i = (size_t)key % len;
aaA* e = aa->b[i];
if (key == e->key)
e = e->next;
return NULL; // not found