On May 1, 2009, at 3:02 PM, Matthew Toseland wrote:
[20:37:16] <evanbd> You standardize on a size for the bloom filters;
say 1MiB.
Then, if your store has 100GB of data, and needs 18MiB of bloom
filters, you
partition the keyspace into 18 chunks of roughly equal population.
Each
segment of the keyspace then gets its own bloom filter.
[20:38:24] <evanbd> Then, if my node has 100MiB of memory to spend
on my
peers' bloom filters, and 20 peers, I just ask each peer for up to 5
bloom
filters.
[20:40:11] <toad_> evanbd: it's good to have a partial filter for
each node
yes
[20:40:24] <toad_> evanbd: however, you end up checking more filters
which
increases false positives
[20:40:52] <evanbd> No, the fp rate stays the same.
[20:41:18] <evanbd> Suppose your node has 18 filters, each with a
1E-5 fp rate
[20:41:36] <evanbd> When I get a request, I compare it to your
node's filter
set.
[20:41:57] <evanbd> But only *one* of those filters gets checked,
since each
one covers a different portion of the keyspace
If requested to send 5 partial filters, which do you send? I presume
those closest to the originator's present location.
Looks like you actually may end up checking less filters overall (if
the blooms from large peers are out-of-range of the request).
Nice idea, Evan!
On May 1, 2009, at 2:58 PM, Evan Daniel wrote:
The address of a bit can be represented in 32 bits trivially (for
bloom filters < 512MiB in size), so the 1-hour diff should consume
13824*32/8=55296 bytes. That represents 15.36 bytes/s of traffic for
each peer, or 307.2B/s across 20 peers.
Ok. I'm convinced.
--
Robert Hailey
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl