On May 1, 2009, at 3:02 PM, Matthew Toseland wrote: > [20:37:16] <evanbd> You standardize on a size for the bloom filters; > say 1MiB. > Then, if your store has 100GB of data, and needs 18MiB of bloom > filters, you > partition the keyspace into 18 chunks of roughly equal population. > Each > segment of the keyspace then gets its own bloom filter. > [20:38:24] <evanbd> Then, if my node has 100MiB of memory to spend > on my > peers' bloom filters, and 20 peers, I just ask each peer for up to 5 > bloom > filters. > [20:40:11] <toad_> evanbd: it's good to have a partial filter for > each node > yes > [20:40:24] <toad_> evanbd: however, you end up checking more filters > which > increases false positives > [20:40:52] <evanbd> No, the fp rate stays the same. > [20:41:18] <evanbd> Suppose your node has 18 filters, each with a > 1E-5 fp rate > [20:41:36] <evanbd> When I get a request, I compare it to your > node's filter > set. > [20:41:57] <evanbd> But only *one* of those filters gets checked, > since each > one covers a different portion of the keyspace
If requested to send 5 partial filters, which do you send? I presume those closest to the originator's present location. Looks like you actually may end up checking less filters overall (if the blooms from large peers are out-of-range of the request). Nice idea, Evan! On May 1, 2009, at 2:58 PM, Evan Daniel wrote: > The address of a bit can be represented in 32 bits trivially (for > bloom filters < 512MiB in size), so the 1-hour diff should consume > 13824*32/8=55296 bytes. That represents 15.36 bytes/s of traffic for > each peer, or 307.2B/s across 20 peers. Ok. I'm convinced. -- Robert Hailey -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090501/f6a44eeb/attachment.html>
