On May 1, 2009, at 3:02 PM, Matthew Toseland wrote:

> [20:37:16] <evanbd> You standardize on a size for the bloom filters;  
> say 1MiB.
> Then, if your store has 100GB of data, and needs 18MiB of bloom  
> filters, you
> partition the keyspace into 18 chunks of roughly equal population.   
> Each
> segment of the keyspace then gets its own bloom filter.
> [20:38:24] <evanbd> Then, if my node has 100MiB of memory to spend  
> on my
> peers' bloom filters, and 20 peers, I just ask each peer for up to 5  
> bloom
> filters.
> [20:40:11] <toad_> evanbd: it's good to have a partial filter for  
> each node
> yes
> [20:40:24] <toad_> evanbd: however, you end up checking more filters  
> which
> increases false positives
> [20:40:52] <evanbd> No, the fp rate stays the same.
> [20:41:18] <evanbd> Suppose your node has 18 filters, each with a  
> 1E-5 fp rate
> [20:41:36] <evanbd> When I get a request, I compare it to your  
> node's filter
> set.
> [20:41:57] <evanbd> But only *one* of those filters gets checked,  
> since each
> one covers a different portion of the keyspace

If requested to send 5 partial filters, which do you send? I presume  
those closest to the originator's present location.

Looks like you actually may end up checking less filters overall (if  
the blooms from large peers are out-of-range of the request).

Nice idea, Evan!

On May 1, 2009, at 2:58 PM, Evan Daniel wrote:
> The address of a bit can be represented in 32 bits trivially (for
> bloom filters < 512MiB in size), so the 1-hour diff should consume
> 13824*32/8=55296 bytes.  That represents 15.36 bytes/s of traffic for
> each peer, or 307.2B/s across 20 peers.

Ok. I'm convinced.

--
Robert Hailey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090501/f6a44eeb/attachment.html>

Reply via email to