Re: [PHP-DEV] zend_inline_hash_function reimplementation

Scott MacVicar Mon, 21 Jul 2008 11:16:01 -0700

Hi Andi,

The patch is attached for 5_3.

I've got some time allocated tomorrow to review all of Michal's patchesthat have been produced for the GSoC. I'll try to post some figures fromreal life apps.


Scott
Andi Gutmans wrote:

Hi Michal,

Can you please send a link to the patch so we can review? I didn't get
the attachment.

Thanks,
Andi

-----Original Message-----
From: Michal Dziemianko [mailto:[EMAIL PROTECTED]
Sent: Monday, July 21, 2008 8:29 AM
To: internals@lists.php.net
Subject: [PHP-DEV] zend_inline_hash_function reimplementation

Hello,
I have looked into Zend/zend_hash.h and I guess it might be sped up a

little.

So far it uses D. Bernstein's hash which is quite fast, but I think it

might

be worth replacing it with MurmurHash. I have tried comparison of

speed for

them (both as separate C programs, and compiled into PHP 5_3). Results

for

REAL keys (collected on running web server) are at the bottom of this

page:

http://212.85.117.53/gsoc/
index.php?option=com_content&view=article&id=65:hash-functions-for-
hash-tables&catid=34:profiling&Itemid=54

Murmur is Public Domain, and its author is maintaining it all the time

so I

guess it will be faster and better. There is comparison of

distribution at

both the page I gave and here: http:// murmurhash.googlepages.com/

The patch for 5_3 (applicable to HEAD) as attachment

Index: zend_hash.h
===================================================================
RCS file: /repository/ZendEngine2/zend_hash.h,v
retrieving revision 1.78.2.2.2.2.2.4
diff -u -d -r1.78.2.2.2.2.2.4 zend_hash.h
--- zend_hash.h 29 Apr 2008 08:15:17 -0000      1.78.2.2.2.2.2.4
+++ zend_hash.h 20 Jul 2008 12:37:41 -0000
@@ -222,65 +222,47 @@
 ZEND_API int zend_hash_rehash(HashTable *ht);
 
 /*
- * DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
- *
- * This is Daniel J. Bernstein's popular `times 33' hash function as
- * posted by him years ago on comp.lang.c. It basically uses a function
- * like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
- * known hash functions for strings. Because it is both computed very
- * fast and distributes very well.
- *
- * The magic of number 33, i.e. why it works better than many other
- * constants, prime or not, has never been adequately explained by
- * anyone. So I try an explanation: if one experimentally tests all
- * multipliers between 1 and 256 (as RSE did now) one detects that even
- * numbers are not useable at all. The remaining 128 odd numbers
- * (except for the number 1) work more or less all equally well. They
- * all distribute in an acceptable way and this way fill a hash table
- * with an average percent of approx. 86%. 
- *
- * If one compares the Chi^2 values of the variants, the number 33 not
- * even has the best value. But the number 33 and a few other equally
- * good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
- * advantage to the remaining numbers in the large set of possible
- * multipliers: their multiply operation can be replaced by a faster
- * operation based on just one shift plus either a single addition
- * or subtraction operation. And because a hash function has to both
- * distribute good _and_ has to be very fast to compute, those few
- * numbers should be preferred and seems to be the reason why Daniel J.
- * Bernstein also preferred it.
- *
- *
- *                  -- Ralf S. Engelschall <[EMAIL PROTECTED]>
+ * MURMUR Hash - http://murmurhash.googlepages.com/
  */
-
-static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength)
+#define MURMUR_SEED 5831
+#define MURMUR_M 0x5bd1e995
+#define MURMUR_R 24
+static inline ulong zend_inline_hash_func(const char *key, uint len)
 {
-       register ulong hash = 5381;
+       unsigned int h = MURMUR_SEED ^ len;
+       unsigned char * data = (unsigned char *)key;
 
-       /* variant with the hash unrolled eight times */
-       for (; nKeyLength >= 8; nKeyLength -= 8) {
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-               hash = ((hash << 5) + hash) + *arKey++;
-       }
-       switch (nKeyLength) {
-               case 7: hash = ((hash << 5) + hash) + *arKey++; /* 
fallthrough... */
-               case 6: hash = ((hash << 5) + hash) + *arKey++; /* 
fallthrough... */
-               case 5: hash = ((hash << 5) + hash) + *arKey++; /* 
fallthrough... */
-               case 4: hash = ((hash << 5) + hash) + *arKey++; /* 
fallthrough... */
-               case 3: hash = ((hash << 5) + hash) + *arKey++; /* 
fallthrough... */
-               case 2: hash = ((hash << 5) + hash) + *arKey++; /* 
fallthrough... */
-               case 1: hash = ((hash << 5) + hash) + *arKey++; break;
-               case 0: break;
-EMPTY_SWITCH_DEFAULT_CASE()
+       while(len >= 4)
+       {
+               unsigned int k = *(unsigned int *)data;
+
+               k *= MURMUR_M; 
+               k ^= k >> MURMUR_R; 
+               k *= MURMUR_M; 
+                       
+               h *= MURMUR_M; 
+               h ^= k;
+
+               data += 4;
+               len -= 4;
        }
-       return hash;
+               
+       // Handle the last few bytes of the input array
+       switch(len)
+       {
+               case 3: h ^= data[2] << 16;
+               case 2: h ^= data[1] << 8;
+               case 1: h ^= data[0];
+                               h *= MURMUR_M;
+       };
+
+       // Do a few final mixes of the hash to ensure the last few
+       // bytes are well-incorporated.
+       h ^= h >> 13;
+       h *= MURMUR_M;
+       h ^= h >> 15;
+
+       return h; 
 }

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] zend_inline_hash_function reimplementation

Reply via email to