Hi Matt,

I also tried different hash functions (including mumur3) and came to similar 
result.
Faster function but more hash collisions.
Actually these collisions occurs not because of different hash values, but 
because of small hash array size.
Also the way we transform hash value into hash array index (use low bits) also 
may matter.
If we decrease hash load-factor, we get less collisions, but use more memory 
and this also leads to cache misses increase.

You are welcome to try Robin-Hood, in case you get positive results we may 
switch to it (however I'm sceptical)

Thanks. Dmitry.

________________________________________
From: Matt Wilmas <php_li...@realplain.com>
Sent: Friday, April 29, 2016 3:04:44 AM
To: internals@lists.php.net
Cc: Dmitry Stogov; Nikita Popov; Xinchen Hui
Subject: New HashTable implementation?

Hi all,

Last June, it was briefly mentioned about changing PHP's string hash
function [1] (DJB33 *seems* pretty horrible, after all, as far as
collisions...).  So 8 months ago I tried almost, if not, a half-dozen of
them (including Murmur3) that were simple enough to quickly toss in.

The result?  In all cases, I believe, fewer instructions (in zend_hash_find,
and the hashing function), BUT also a slight-to-small increase in cache
misses in Wordpress and other scripts...

And in a test filling an array with a million "string_$i" keys (high
collision pattern for DJB33?), the speed was halved by the *huge* cache miss
increase. :-/

I couldn't quite understand what was happening, where, if there were fewer
collisions...  Misses all spread out in the hash-array?

So there didn't seem to be anything useful to gain there.


Now, after seeing Bogdan's hash optimization idea last month [2], and
reading Nikita's blog post again, I had some ideas I'd like to try --
assuming nobody else is planning major changes. :-)  Besides Nikita, I'm
addressing Dmitry and Xinchen because your names are on some minor hash
items on the 7.1 ideas wiki [4].

I'm thinking of a Robin Hood implementation with Universal hashing [5] (of
int keys and the string hashes).  I haven't touched any code yet, but think
I've worked out all the details in my head, and am ready to take a stab at
it.  I think it's fairly simple to get the basics working; and when I see
how that goes, I can do the additional optimizations I have in mind that it
allows (including reduced memory, on 64-bit at least).

Question: Can I use zval.u1.v.reserved ?  I guess I'll find out otherwise.
:-O


The string hash function itself is really a separate thing [6], but fasthash
[7] (not to be confused with "superfast") looks like a good one that I
missed before...  After thinking about things, I think we could even
keep/use a 64-bit hash on 32-bit arch.


Well, just wanted to mention it first if anyone has a comment. :-)  Should
be interesting, but no idea how it'll perform (lookups should be very, very
fast (upsizing also); but cache factors and inserts/deletes are wildcards).
Wish me luck!?


Thanks,
Matt

[1] https://marc.info/?l=php-internals&m=143444845304138&w=2
[2] https://marc.info/?t=145744248100001&r=1&w=2
[4] https://wiki.php.net/php-7.1-ideas
[5] https://en.wikipedia.org/wiki/Universal_hashing
[6] https://github.com/rurban/smhasher
[7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to