Martin Truebner (Trübner?) writes: <begin snippet> and I insist: the number of possible input values is 46656 and this is the number of output values exact 46656 (not more-fewer but identical) <end snippet>
It is of course immediate that for a set of n characters the number of permutations of them taken m at a time is n^m and that, in particular, 36^3 = 46656. If one calculates N hash values one has or at least at some point had N hash values. The obvious disposed of, what is of interest here is the distribution of these values. Consider the function f(i) = mod(i,2) for i = 0, 1, 2, 3, . . . Evaluating it for the first N elements of this sequence certainly yields N values, but what is interesting about them is not that there are N of them but that, crudely, half of them are 0 and half of them are 1. Now one can of course arrange things otherwise. The function f(i) = mod(2i + 1,2), i = 0, 1, 2, 3, . . . always has the value 1, and the function f(i) = mod(2i,2), i = 0, 1, 2, 3, . . . always has the value 0. I entered this discussion because an examination of clustering side effects is necessary for any hashing scheme. They may be important, or again they may not be, but it is essential that they be evaluated. What I got for my trouble was a lot of guff, much of it specious and some of it literal nonsense, from people who lack a minimal grasp of the issues involved. I shall take this to heart. In the future I shall avoid discussing mathematical topics, in which I have some competence, here just as I avoid discussing channel-programming topics, in which I have none. --jg
