I had mentioned that I was hoping to remember/find the appropriate formula for the probability of a hash bucket reaching its limit. I got out my prob-stat book from I-won't-mention-how-many years ago.
Suppose there's a limit of L connections to be put in a hash table with B buckets, each bucket allowed to hold a max of M entries. I assume that the hash function maps a connection to each bucket with equal probability, P = 1/B. Suppose the table is full, meaning it has L connections in it. The probability that a given bucket has n connections in it is: c(L,n) (P ^ n) ((1-P) ^ (L-n)) [called binomial distribution] where c(L,n) is the number of ways of choosing n things out of L, which is L! / (L-n)! n!, i.e., 10x9x8/3x2x1 ways to choose 3 out of 10. The binomial formula comes from the fact that we've chosen n connections out of L to put in this bucket, each of those with probability P, and of course the other L-n went into other buckets with probability 1-P. BTW, this is closely approximated by a Poisson distribution. Anyhow, on to some answers. My initial guess was that with L/2 buckets (so the average bucket length is 2), max bucket size M=10 would result in only a microscopic chance of overflow. The chance is bigger than I expected, about 1E-5. Raising M to 15 reduces it to about 1E-10 and raising M to 20 makes it about 1E-15, which seems quite respectable. There's a pretty straight forward tradeoff between space and time. Instead of increasing the time (# iterations = M) you can increase the space (# buckets). Changing from L/2 buckets to L we get M=10 probability 1E-9 of filling a bucket M=15 probability 1E-14 M=20 probability 1E-20 And while we're at it, with 2L buckets (average occupancy .5) M=10 prob ~ 1E-11 M=15 prob ~ 1E-18 M=20 prob ~ 1E-25 My guess is that the amount of time required to do 20 iterations comparing connection data with elements of a list in a bucket is very small compared to the amount of time otherwise spent on a packet. On the other hand, a hash bucket is also pretty cheap. The data for a connection takes much more space than 2 hash buckets, or even 10. And with 10L buckets the chance of overflow at M=10 is only 1E-19, for M=20 it's somewhere around 1E-40. So, there you have it.