Hi all,

Last June, it was briefly mentioned about changing PHP's string hash
function [1] (DJB33 *seems* pretty horrible, after all, as far as
collisions...).  So 8 months ago I tried almost, if not, a half-dozen of
them (including Murmur3) that were simple enough to quickly toss in.

The result?  In all cases, I believe, fewer instructions (in zend_hash_find,
and the hashing function), BUT also a slight-to-small increase in cache
misses in Wordpress and other scripts...

And in a test filling an array with a million "string_$i" keys (high
collision pattern for DJB33?), the speed was halved by the *huge* cache miss
increase. :-/

I couldn't quite understand what was happening, where, if there were fewer
collisions...  Misses all spread out in the hash-array?

So there didn't seem to be anything useful to gain there.


Now, after seeing Bogdan's hash optimization idea last month [2], and
reading Nikita's blog post again, I had some ideas I'd like to try --
assuming nobody else is planning major changes. :-)  Besides Nikita, I'm
addressing Dmitry and Xinchen because your names are on some minor hash
items on the 7.1 ideas wiki [4].

I'm thinking of a Robin Hood implementation with Universal hashing [5] (of
int keys and the string hashes).  I haven't touched any code yet, but think
I've worked out all the details in my head, and am ready to take a stab at
it.  I think it's fairly simple to get the basics working; and when I see
how that goes, I can do the additional optimizations I have in mind that it
allows (including reduced memory, on 64-bit at least).

Question: Can I use zval.u1.v.reserved ?  I guess I'll find out otherwise.
:-O


The string hash function itself is really a separate thing [6], but fasthash
[7] (not to be confused with "superfast") looks like a good one that I
missed before...  After thinking about things, I think we could even
keep/use a 64-bit hash on 32-bit arch.


Well, just wanted to mention it first if anyone has a comment. :-)  Should
be interesting, but no idea how it'll perform (lookups should be very, very
fast (upsizing also); but cache factors and inserts/deletes are wildcards).
Wish me luck!?


Thanks,
Matt

[1] https://marc.info/?l=php-internals&m=143444845304138&w=2
[2] https://marc.info/?t=145744248100001&r=1&w=2
[4] https://wiki.php.net/php-7.1-ideas
[5] https://en.wikipedia.org/wiki/Universal_hashing
[6] https://github.com/rurban/smhasher
[7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to