See http://en.wikipedia.org/wiki/Double_hashing for information on double hashing.
On Tue, Jan 25, 2011 at 8:11 AM, Nicolas Spiegelberg <[email protected]>wrote: > A great article for Bloom Filter rules of thumb: > > http://corte.si/posts/code/bloom-filter-rules-of-thumb/ > > Note that only rules #1 & #2 apply for our use case. Rule #3, while true, > isn't as big a worry because we use combinatorial generation for hashes, so > the number of 'expensive' hash calculations is 2, no matter how many hash > functions need to be generated. Note that this drastically (400%+) sped up > our BloomFilter.add() speed. > > Sent from my iPhone > > On Jan 25, 2011, at 6:22 AM, "Lars George" <[email protected]> wrote: > > > Hi, > > > > (Probably aimed at Nicolas) > > > > Do we have a (rough) formula of overhead, i.e. the size of the > > bloomfilters for row and col granularity as for example depending on > > the KV count and average sizes (as reported by the HFile main() > > helper)? > > > > Thanks, > > Lars >
