On Thu, 2016-04-28 at 19:04 -0500, Matt Wilmas wrote:
> Hi all,
> 
> Last June, it was briefly mentioned about changing PHP's string hash
> function [1] (DJB33 *seems* pretty horrible, after all, as far as
> collisions...).  So 8 months ago I tried almost, if not, a half-dozen 
> of
> them (including Murmur3) that were simple enough to quickly toss in.
> 
> The result?  In all cases, I believe, fewer instructions (in
> zend_hash_find,
> and the hashing function), BUT also a slight-to-small increase in
> cache
> misses in Wordpress and other scripts...
> 
> And in a test filling an array with a million "string_$i" keys (high
> collision pattern for DJB33?), the speed was halved by the *huge*
> cache miss
> increase. :-/
> 
> I couldn't quite understand what was happening, where, if there were
> fewer
> collisions...  Misses all spread out in the hash-array?
> 
> So there didn't seem to be anything useful to gain there.
> 
> 
> Now, after seeing Bogdan's hash optimization idea last month [2], and
> reading Nikita's blog post again, I had some ideas I'd like to try --
> assuming nobody else is planning major changes. :-)  Besides Nikita,
> I'm
> addressing Dmitry and Xinchen because your names are on some minor
> hash
> items on the 7.1 ideas wiki [4].
> 
> I'm thinking of a Robin Hood implementation with Universal hashing
> [5] (of
> int keys and the string hashes).  I haven't touched any code yet, but
> think
> I've worked out all the details in my head, and am ready to take a
> stab at
> it.  I think it's fairly simple to get the basics working; and when I
> see
> how that goes, I can do the additional optimizations I have in mind
> that it
> allows (including reduced memory, on 64-bit at least).
> 
> Question: Can I use zval.u1.v.reserved ?  I guess I'll find out
> otherwise.
> :-O
> 
> 
> The string hash function itself is really a separate thing [6], but
> fasthash
> [7] (not to be confused with "superfast") looks like a good one that
> I
> missed before...  After thinking about things, I think we could even
> keep/use a 64-bit hash on 32-bit arch.
> 
> 
> Well, just wanted to mention it first if anyone has a comment. :-
> )  Should
> be interesting, but no idea how it'll perform (lookups should be
> very, very
> fast (upsizing also); but cache factors and inserts/deletes are
> wildcards).
> Wish me luck!?
> 
> 
> Thanks,
> Matt
> 
> [1] https://marc.info/?l=php-internals&m=143444845304138&w=2
> [2] https://marc.info/?t=145744248100001&r=1&w=2
> [4] https://wiki.php.net/php-7.1-ideas
> [5] https://en.wikipedia.org/wiki/Universal_hashing
> [6] https://github.com/rurban/smhasher
> [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64
> 
> 

Have you taken a look a go's internal hashing function?

On platforms supporting AES-NI they use the AESENC instruction to get a
fast hash, but with also some protection against collision attacks. 

https://github.com/golang/go/blob/0104a31b8fbcbe52728a08867b26415d282c3
5d2/src/runtime/asm_amd64.s#L870

Jared

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to