Hi,

I while ago I did some experimentation with simple bit-op based string
hash
functions.  I.e., no multiplications / divides in the hash loop.

The best I found was:

int hash_fn (char * p)
{
  int hash = 0;
  while (*p) {
     hash = hash + *p;
     // Rotate a 31 bit field 7 bits:
     hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
  }
  return hash;
}

[I haven't kept my test program / data set - if anyone compares the
above
 to the others functions mentioned on the list, let me know.]

The 31 and 7 were determined experimentally.  But the 31 has a
theoretical
explanation (which might even be valid):

The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],
where P is the polynomial x**31 - 1 (over Z_2).
Presumably the "best" P would be irreducible - but that would have more
bits set in the polynomial, making reduction harder.  A compromise is to
choose P in the form x**N - 1 but with relatively few factors.
X**31 - 1 is such a P.

Also, a 32 bit rotate (modulo X**32 - 1, which is equal
to (X - 1) ** 32 over Z_2), came out pretty badly.

One thing that shouldn't be forgotten about hashing for hash tables
is that you have to reduce the hash value to the required range - doing
that well greatly reduces the difference between various hash functions.

Ralph.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to