Re: why bloom filter is only for row key?

DuyHai Doan Sun, 14 Sep 2014 11:46:28 -0700

Hello Philo

 Building bloom filter for column names (what you call column key) is
technically possible but very expensive in term of memory usage.

  The approximate formula to calculate space required by bloom filter can
be found on slide 27 here:
http://fr.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees

false positive chance = 0.6185 * m/n  where m = number of bits for the
filter and n = number of distinct keys

For example, if you want to index 1 million of rows, each having 100 000
columns on average, it will end up indexing 100 billions of keys (row keys
& column names) with bloom filter.

 By applying the above formula, m ≈ 4.8 * 10^11 bits ≈ 60Gb to allocate in
RAM just for bloom filter on all row keys & column names ...

 Regards

 Duy Hai DOAN

On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud1...@gmail.com> wrote:

> Hi all,
>
> After reading some docs, I find that bloom filter is built on row keys,
> not on column key. Can anyone tell me what is considered for not building
> bloom filter on column key? Is it a good idea to offer a table property
> option between row key and primary key for what boolm filter is built on?
>
> Thanks,
> Philo Yang
>
>

Re: why bloom filter is only for row key?

Reply via email to