Currently, the Bloom filters use 46 bits per key (23 buckets, 2-bit
counter per bucket) of RAM, using 16 hash functions.  This gives a
false positive rate of 1.5E-5.

Oddly enough, a simple hash table has lower memory usage.  Create an
array in memory of n-bit values, one value per slot in the salted-hash
store.  The value stored in the array is an n-bit hash of the key in
the corresponding store location.  On a given lookup, the odds of a
false positive are 2^-n.  Because the store does quadratic probing
with 4 slots, there are 4 lookups per key request.  The false positive
rate is then 2^-(n-2).  For n=18, we get a false positive rate of
1.5E-5.  n=16 would align on byte boundaries and save a tiny bit of
memory at a cost of a false positive rate of 6.1E-5.

The reason this works better than the Bloom filter is because a given
key can only go in a limited set of locations in the salted hash
store.  The Bloom filter works equally well whether or not that
restriction is present.  With this method, as the number of possible
locations grows, so does the false positive rate.  For low
associativity, the simple hash table wins.

This also makes updates and removal trivial.  However, we probably
can't share it with our neighbors without giving away our salt value.
For that, we probably want to continue planning to use Bloom filters.

Evan Daniel

Reply via email to