Andries,


> > int hash_fn (char * p)
> > {
> >   int hash = 0;
> >   while (*p) {
> >      hash = hash + *p;
> >      // Rotate a 31 bit field 7 bits:
> >      hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
> >   }
> >   return hash;
> > }

> 

> Hmm. This one goes in the "catastrophic" category.

> 
> For actual names:
> 
> N=557398, m=51 sz=2048, min 82, max 4002, av 272.17, stddev 45122.99
> 
> For generated names:
> 
> N=557398, m=51 sz=2048, min 0, max 44800, av 272.17, stddev 10208445.83
> 

Instead of masking the hash value down to 11 bits you could try:

index = (hash ^ (hash >> 11) ^ (hash >> 22)) & 0x7ff;

I ran a quick test which gave fairly good results with that: 12871
identifiers
from a source tree) gave a mean square bucket size of 45.65, expected
value for a random function is 45.78.

That change might improve some of your other hashes as well - there
doesn't
seem to be much point in computing a 32 bit value only to throw 20 bits
away -
stirring in the extra bits makes much more sense to me.

> > The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],

> > where P is the polynomial x**31 - 1 (over Z_2).
> > Presumably the "best" P would be irreducible - but that would have more
> > bits set in the polynomial, making reduction harder.  A compromise is to
> > choose P in the form x**N - 1 but with relatively few factors.
> > X**31 - 1 is such a P.
> 
> It has seven irreducible factors. Hardly "almost irreducible".

I didn't say it was.  "almost irreducible" polynomials with Hamming
weight two are pretty rare...  Relative to say, x**32 - 1 or x**24 - 1,
having 7 factors is good.

Ralph.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to