From: Jon Perryman Sent: Friday, 2 November 2012 10:10 AM
Clustering is a statics term that describes the tendency of events to occur around groups and less frequently around other groups. In this case, it is where we are trying to predict (or guess) values to make better use of the hash table storage.
To make it easier to understand, try to guess the first character of a word that I am thinking. Did you guess X or Z? No, because there are very few words that begin with with these letters.
No, you might well have chosen one of those X or Z for that very reason.
On the other hand a larger number of words begin with T and A so these are more likely.
Actually, "S" would have been a better choice, because the number of words beginning with "S" is several times more than those commencing with "T".
The cluster size of T and A is significantly lager than X and Z. You can see the clustering at http://www.ask.com/wiki/Letter_frequency if you scroll down to the graphs.
You can also see it by looking in any dictionary.
