On 2019/08/03 20:42:10, Ryan Blue <[email protected]> wrote: > - Should the bloom filters support compression? If the strategy for a > lower false-positive rate is to under-fill the multi-block filter, then > would compression help reduce the storage cost?
Eventually, it might be useful to support compression. The question of how to do so in the unique case of Bloom filters, though, deserves some thought and planning that I think would be most useful in a follow-on. Mitzenmacher's "Compressed Bloom Filters" [0] has some discussion of the issues. There are complex tradeoffs between compressed size, uncompressed size, number of hash functions, and the false positive rate, and this is only one direction we could go to reduce the false positive rate while keeping the storage space constant. For instance, there are several other filter algorithms that achieve the best possible false positive rate (for any filter, compressed or not), but that pay for it with complexity, query time, and construction time. These include Golomb filters, XORSAT filters, Porat's matrix filters, Pagh et. al's filters, etc. [0] http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf2.pdf
