Hey, > Am 29.4.2016 um 02:04 schrieb Matt Wilmas <php_li...@realplain.com>: > > Hi all, > > Last June, it was briefly mentioned about changing PHP's string hash > function [1] (DJB33 *seems* pretty horrible, after all, as far as > collisions...). So 8 months ago I tried almost, if not, a half-dozen of > them (including Murmur3) that were simple enough to quickly toss in. > > The result? In all cases, I believe, fewer instructions (in zend_hash_find, > and the hashing function), BUT also a slight-to-small increase in cache > misses in Wordpress and other scripts... > > And in a test filling an array with a million "string_$i" keys (high > collision pattern for DJB33?), the speed was halved by the *huge* cache miss > increase. :-/ > > I couldn't quite understand what was happening, where, if there were fewer > collisions... Misses all spread out in the hash-array? > > So there didn't seem to be anything useful to gain there. > > > Now, after seeing Bogdan's hash optimization idea last month [2], and > reading Nikita's blog post again, I had some ideas I'd like to try -- > assuming nobody else is planning major changes. :-) Besides Nikita, I'm > addressing Dmitry and Xinchen because your names are on some minor hash > items on the 7.1 ideas wiki [4]. > > I'm thinking of a Robin Hood implementation with Universal hashing [5] (of > int keys and the string hashes). I haven't touched any code yet, but think > I've worked out all the details in my head, and am ready to take a stab at > it. I think it's fairly simple to get the basics working; and when I see > how that goes, I can do the additional optimizations I have in mind that it > allows (including reduced memory, on 64-bit at least). > > Question: Can I use zval.u1.v.reserved ? I guess I'll find out otherwise. > :-O
No, not really. zval.u1 gets typically overwritten on each type assignment. (As type assigns typically happen via 32 bit type_info field directly for performance reasons.) Eventually you may find a way to split up u2 in an useful way? > > The string hash function itself is really a separate thing [6], but fasthash > [7] (not to be confused with "superfast") looks like a good one that I > missed before... After thinking about things, I think we could even > keep/use a 64-bit hash on 32-bit arch. > > > Well, just wanted to mention it first if anyone has a comment. :-) Should > be interesting, but no idea how it'll perform (lookups should be very, very > fast (upsizing also); but cache factors and inserts/deletes are wildcards). > Wish me luck!? Good luck! :-) > > Thanks, > Matt > > [1] https://marc.info/?l=php-internals&m=143444845304138&w=2 > [2] https://marc.info/?t=145744248100001&r=1&w=2 > [4] https://wiki.php.net/php-7.1-ideas > [5] https://en.wikipedia.org/wiki/Universal_hashing > [6] https://github.com/rurban/smhasher > [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64 Bob