See http://en.wikipedia.org/wiki/Double_hashing for information on double
hashing.

On Tue, Jan 25, 2011 at 8:11 AM, Nicolas Spiegelberg <[email protected]>wrote:

> A great article for Bloom Filter rules of thumb:
>
> http://corte.si/posts/code/bloom-filter-rules-of-thumb/
>
> Note that only rules #1 & #2 apply for our use case. Rule #3, while true,
> isn't as big a worry because we use combinatorial generation for hashes, so
> the number of 'expensive' hash calculations is 2, no matter how many hash
> functions need to be generated.   Note that this drastically (400%+) sped up
> our BloomFilter.add() speed.
>
> Sent from my iPhone
>
> On Jan 25, 2011, at 6:22 AM, "Lars George" <[email protected]> wrote:
>
> > Hi,
> >
> > (Probably aimed at Nicolas)
> >
> > Do we have a (rough) formula of overhead, i.e. the size of the
> > bloomfilters for row and col granularity as for example depending on
> > the KV count and average sizes (as reported by the HFile main()
> > helper)?
> >
> > Thanks,
> > Lars
>

Reply via email to