Gidon, thanks for the explanation about tampering. That sounds good to me. Jim, I think we are definitely a bit closer with your PR. That explains a lot of the algorithm clearly.
After reading through the requirements for bloom filter sizing, I think we should probably look into adding compression, even if it is to signal that the bloom filter is uncompressed. Because we require that the number of blocks is a power of 2, there will probably be some room for compression to work. At least getting a compression union into the bloom filter header will help us with compatibility later if we choose to add compression schemes. It think it may also be worth the overhead of naive compression in some cases, though I didn't thoroughly read through that reference yet. On Sun, Aug 4, 2019 at 7:56 PM Jim Apple <[email protected]> wrote: > On 2019/08/03 20:42:10, Ryan Blue <[email protected]> wrote: > > - Should the bloom filters support compression? If the strategy for a > > lower false-positive rate is to under-fill the multi-block filter, > then > > would compression help reduce the storage cost? > > Eventually, it might be useful to support compression. The question of how > to do so in the unique case of Bloom filters, though, deserves some thought > and planning that I think would be most useful in a follow-on. > Mitzenmacher's "Compressed Bloom Filters" [0] has some discussion of the > issues. > > There are complex tradeoffs between compressed size, uncompressed size, > number of hash functions, and the false positive rate, and this is only one > direction we could go to reduce the false positive rate while keeping the > storage space constant. For instance, there are several other filter > algorithms that achieve the best possible false positive rate (for any > filter, compressed or not), but that pay for it with complexity, query > time, and construction time. These include Golomb filters, XORSAT filters, > Porat's matrix filters, Pagh et. al's filters, etc. > > [0] http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf2.pdf > -- Ryan Blue Software Engineer Netflix
