Martin Truebner (Trübner?) writes:

<begin snippet>
and I insist: the number of possible input values is 46656 and this is
the number of output values exact 46656 (not more-fewer but identical)
<end snippet>

It is of course immediate that for a set of n characters the number of
permutations of them taken m at a  time is n^m and that, in
particular, 36^3 = 46656.

If one calculates N hash values one has or at least at some point had
N hash values.  The obvious disposed of, what is of interest here is
the distribution of these values.

Consider the function f(i) = mod(i,2) for i = 0, 1, 2, 3,  . . .
Evaluating it for the first N elements of this sequence certainly
yields N values, but what is interesting about them is not that there
are N of them but that, crudely, half of them are 0 and half of them
are 1.

Now one can of course arrange things otherwise.  The function

f(i) = mod(2i + 1,2), i = 0, 1, 2, 3, . . .

always has the value 1, and the function

f(i) = mod(2i,2), i = 0, 1, 2, 3, . . .

always has the value 0.

I entered this discussion because an examination of clustering side
effects is necessary for any hashing scheme.  They may be important,
or again they may not be, but it is essential that they be evaluated.

What I got for my trouble was a lot of guff, much of it specious and
some of it literal nonsense, from people who lack a minimal grasp of
the issues involved.  I shall take this to heart.  In the future I
shall avoid discussing mathematical topics, in which I have some
competence, here just as I avoid discussing channel-programming
topics, in which I have none.

--jg

Reply via email to