Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by izaakrubin: http://wiki.apache.org/hadoop/Hbase/UsingBloomFilters ------------------------------------------------------------------------------ Bloom filters can be enabled on a per-column family basis in Hbase. - There are three bloom filter variants supported: + There are four bloom filter variants supported: 1. A [http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal bloom filter] as defined by Bloom in 1970. 1. A [http://portal.acm.org/citation.cfm?id=343571.343572 counting bloom filter] as defined by Fan et al. in a ToN 2000 paper. 1. A [http://www-rp.lip6.fr/site_npa/site_rp/_publications/740-rbf_cameraready.pdf retouched bloom filter] as described in the CoNEXT 2006 paper. + 1. A [http://www.cse.fau.edu/~jie/research/publications/Publication_files/infocom2006.pdf dynamic bloom filter] as defined in the INFOCOM 2006 paper. + Bloom filters can be instantiated by specifying the vector size and the number of hash functions. Dynamic bloom filters require an additional argument, a threshold for the maximum number of keys to record in a row. - There are two ways in which a bloom filter can be instantiated: - 1. by supplying the estimated number of values, in which case HBase selects the number of hash functions to be 4 and computes the vector size from the formula - {{{size = number-of-values * number-of-hashfunctions / ln(2) }}} + Junit testing for these four bloom filters can be found in hbase.regionserver.!TestBloomFilters. - This formula was presented in [http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf Network Applications of Bloom Filters: A Survey, by Broder and Mitzenmacher] - 1.#2 by specifying the vector size and the number of hash functions explicitly. - - Both of these techniques are demonstrated in the Junit test hbase.!TestBloomFilters. '''Additional Resources:'''
