If a hashing scheme is working well there is almost no clustering.
Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our
hashing modulus..  Remainders will have one of the 17 values

0, 1, 2, . . . , 16.

Then some goodly number of hashing operations the same or about the
same number of of the hash values 0, 1, 2, . . . , 16 are generated,
clustering does not occur.

For concreteness, suppose we do 170 divisions.  Then if clustering
does not occur there are about ten remainders having the value 0,
about 10 having the value 1, about 10 having the value 2, etc., etc.

What happens when the divisor used is composite is that hash values
that are prime factors of the divisor occur more frequently than
others.

For 36 we have 36 = 2 x 2 x 3 x 3, which is usually written 2^2 x 3^2
or 2**2 x 3**2.  Its prime factors are 2 and 3; and when it is used as
a divisor there are more remainders having the value 2 and the value 3
than there are having other pairs of values.

37, on the other hand, is prime, divisible only by 1 and itself.  Its
use as a divisor yields no clustering of remainders.

Never hesitate to ask notional gurus such questions.  A request for a
further explanation is always in order.

--jg

Reply via email to