On May 1, 2009, at 9:35 AM, Matthew Toseland wrote:

THE NUMBERS:
The salted-hash datastore uses by default 1/2048th of the store size for the Bloom filters. This works out to a 23-element filter with 2-bit counting, i.e. 46 bits per key in the store (ssk, chk, pubkey, store or cache). The easiest thing would be to reduce this to a 1-bit filter, cutting its size in half. Also we don't need the pubkey store. So the filter for our peers would be approximately 1/6000th of the size of our datastore. This gives 18MB per 100GB datastore size. We could halve this again but it would mean that the average failing request is diverted to nodes which might have the data but don't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005% false positives); each time the request has to wait. So if we have 20 peers each with a 500GB store, that gives 1.8GB of Bloom filters! On the other hand if we have 20 peers each with a 10GB store, that gives 36MB of Bloom filters. Clearly the latter is acceptable; the former probably isn't for most nodes. Also note that for acceptable performance it is essential that the Bloom filters for all our online peers can fit into RAM. What we are going to have to do is set an upper limit on the total memory we can use for Bloom filters,
and drop the biggest filters until we reach the target.

Am I missing something? That's not what I get, referencing...
http://en.wikipedia.org/wiki/Bloom_filter

"In theory, an optimal data structure equivalent to a counting Bloom filter should not use more space than a static Bloom filter."

We have...
10 GB store => 333k keys (elements to be put in filter; "n" in the article) The article says 9.6 bits/element (with good hash functions) will yield 1% error rate.

9.6*333*10^3 = 3,196,800 = 3 Mbits (filter size; "m" in the article) ~=~ 390kB

n=333,333
m=3,196,800

That only leaves 'k' (the number of bits set per entry); "Classic Bloom filters use 1.44log2(1 / ε) bits of space per inserted key, where ε is the false positive rate"

So then your 0.005% yields ~14 bits set per key.
log2( 1/(0.005*0.01) ) * 1.44 ~= 14
Or @ 1% => 10 bits set per key

Some questions though...

It still seems to me that all this is moving away from darknet routing, and we will be sending our peers 'updates' on the new things we've acquired in our data store... Is this adding an optimization to the routing algorithim or replacing it?

Are you recommending that we use variable sizes of bloom filter (based on data store size)? It seems to me that a node could have enough logic to determine if a static-sized bloom filter is 'too full' (if >50%,75% bits are set at update time), and not count it (node has too big a datastore for bloom functionality).

For that matter, could a peer manipulate its advertised filter to get more requests and thus monitor activity? (e.g. full-set/all-1's)

--
Robert Hailey

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to