Re: Overhead of Bloomfilters

Lars George Tue, 25 Jan 2011 10:31:40 -0800

Thanks Nicolas,

I was after the actual size of it though, I assume you have no
disclosable numbers? Just curious.If not I guess the best is to run a
YSCB! or even PE to load a BF enables table and then check the HFile
output? Does that print the BF sizes too?


Lars

On Tue, Jan 25, 2011 at 4:11 PM, Nicolas Spiegelberg
<[email protected]> wrote:
> A great article for Bloom Filter rules of thumb:
>
> http://corte.si/posts/code/bloom-filter-rules-of-thumb/
>
> Note that only rules #1 & #2 apply for our use case. Rule #3, while true, 
> isn't as big a worry because we use combinatorial generation for hashes, so 
> the number of 'expensive' hash calculations is 2, no matter how many hash 
> functions need to be generated.   Note that this drastically (400%+) sped up 
> our BloomFilter.add() speed.
>
> Sent from my iPhone
>
> On Jan 25, 2011, at 6:22 AM, "Lars George" <[email protected]> wrote:
>
>> Hi,
>>
>> (Probably aimed at Nicolas)
>>
>> Do we have a (rough) formula of overhead, i.e. the size of the
>> bloomfilters for row and col granularity as for example depending on
>> the KV count and average sizes (as reported by the HFile main()
>> helper)?
>>
>> Thanks,
>> Lars
>

Re: Overhead of Bloomfilters

Reply via email to