On Sat, Mar 25, 2006 at 07:26:51PM -0500, John Denker wrote: > Executive summary: Small samples do not always exhibit "average" behavior.

That's not the whole problem - you have to be looking at the right "average" too. For the long run encodability of a set of IID symbols produced with probability p_i, then that average is the Shannon Entropy. If you're interested in the mean number of guesses (per symbol) required to guess a long word formed from these symbols, then you should be looking at (\sum_i \sqrt(p_i))^2. Other metrics (min entropy, work factor, ...) require other "averages". To see this behaviour, you both need a large sample and the right type of average to match your problem (and I've assumed IID). David.