On Fri, May 1, 2009 at 4:32 PM, Robert Hailey <robert at freenetproject.org> wrote: > > On May 1, 2009, at 3:02 PM, Matthew Toseland wrote: > > [20:37:16] <evanbd> You standardize on a size for the bloom filters; say > 1MiB. > Then, if your store has 100GB of data, and needs 18MiB of bloom filters, > you > partition the keyspace into 18 chunks of roughly equal population. ?Each > segment of the keyspace then gets its own bloom filter. > [20:38:24] <evanbd> Then, if my node has 100MiB of memory to spend on my > peers' bloom filters, and 20 peers, I just ask each peer for up to 5 bloom > filters. > [20:40:11] <toad_> evanbd: it's good to have a partial filter for each node > yes > [20:40:24] <toad_> evanbd: however, you end up checking more filters which > increases false positives > [20:40:52] <evanbd> No, the fp rate stays the same. > [20:41:18] <evanbd> Suppose your node has 18 filters, each with a 1E-5 fp > rate > [20:41:36] <evanbd> When I get a request, I compare it to your node's > filter > set. > [20:41:57] <evanbd> But only *one* of those filters gets checked, since > each > one covers a different portion of the keyspace > > If requested to send 5 partial filters, which do you send? I presume those > closest to the originator's present location. > Looks like you actually may end up checking?less?filters overall (if the > blooms from large peers are out-of-range of the request).
Yes, that's a question worth considering. There are both performance and security issues involved, I think. Note that the partition could be a set of contiguous regions (allowing performance optimization around which piece of the keyspace you send info about), but it could just as easily be determined by a hash function instead. You still check the same number of filters overall -- one per peer. The difference is that for some peers you may have a partial filter set, and therefore sometimes check their filters, instead of deciding you don't have the memory for that peer's filter and never checking it. > Nice idea, Evan! Thanks! Evan
