Thanks Nicolas, I was after the actual size of it though, I assume you have no disclosable numbers? Just curious.If not I guess the best is to run a YSCB! or even PE to load a BF enables table and then check the HFile output? Does that print the BF sizes too?
Lars On Tue, Jan 25, 2011 at 4:11 PM, Nicolas Spiegelberg <[email protected]> wrote: > A great article for Bloom Filter rules of thumb: > > http://corte.si/posts/code/bloom-filter-rules-of-thumb/ > > Note that only rules #1 & #2 apply for our use case. Rule #3, while true, > isn't as big a worry because we use combinatorial generation for hashes, so > the number of 'expensive' hash calculations is 2, no matter how many hash > functions need to be generated. Note that this drastically (400%+) sped up > our BloomFilter.add() speed. > > Sent from my iPhone > > On Jan 25, 2011, at 6:22 AM, "Lars George" <[email protected]> wrote: > >> Hi, >> >> (Probably aimed at Nicolas) >> >> Do we have a (rough) formula of overhead, i.e. the size of the >> bloomfilters for row and col granularity as for example depending on >> the KV count and average sizes (as reported by the HFile main() >> helper)? >> >> Thanks, >> Lars >
