Re: [freenet-dev] Can we implement Bloom filter sharing quickly???

Robert Hailey Fri, 01 May 2009 09:19:28 -0700


On May 1, 2009, at 9:35 AM, Matthew Toseland wrote:

THE NUMBERS:
The salted-hash datastore uses by default 1/2048th of the store sizefor theBloom filters. This works out to a 23-element filter with 2-bitcounting,i.e. 46 bits per key in the store (ssk, chk, pubkey, store orcache). Theeasiest thing would be to reduce this to a 1-bit filter, cutting itssize inhalf. Also we don't need the pubkey store. So the filter for ourpeers wouldbe approximately 1/6000th of the size of our datastore. This gives18MB per100GB datastore size. We could halve this again but it would meanthat theaverage failing request is diverted to nodes which might have thedata butdon't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005%falsepositives); each time the request has to wait. So if we have 20peers eachwith a 500GB store, that gives 1.8GB of Bloom filters! On the otherhand ifwe have 20 peers each with a 10GB store, that gives 36MB of Bloomfilters.Clearly the latter is acceptable; the former probably isn't for mostnodes.Also note that for acceptable performance it is essential that theBloomfilters for all our online peers can fit into RAM. What we are goingto haveto do is set an upper limit on the total memory we can use for Bloomfilters,
and drop the biggest filters until we reach the target.


Am I missing something? That's not what I get, referencing...
http://en.wikipedia.org/wiki/Bloom_filter

"In theory, an optimal data structure equivalent to a counting Bloomfilter should not use more space than a static Bloom filter."


We have...

10 GB store => 333k keys (elements to be put in filter; "n" in thearticle)The article says 9.6 bits/element (with good hash functions) willyield 1% error rate.

9.6*333*10^3 = 3,196,800 = 3 Mbits (filter size; "m" in the article)~=~ 390kB


n=333,333
m=3,196,800

That only leaves 'k' (the number of bits set per entry); "ClassicBloom filters use 1.44log2(1 / ε) bits of space per inserted key,where ε is the false positive rate"


So then your 0.005% yields ~14 bits set per key.
log2( 1/(0.005*0.01) ) * 1.44 ~= 14
Or @ 1% => 10 bits set per key

Some questions though...

It still seems to me that all this is moving away from darknetrouting, and we will be sending our peers 'updates' on the new thingswe've acquired in our data store... Is this adding an optimization tothe routing algorithim or replacing it?

Are you recommending that we use variable sizes of bloom filter (basedon data store size)?It seems to me that a node could have enough logic to determine if astatic-sized bloom filter is 'too full' (if >50%,75% bits are set atupdate time), and not count it (node has too big a datastore for bloomfunctionality).

For that matter, could a peer manipulate its advertised filter to getmore requests and thus monitor activity? (e.g. full-set/all-1's)


--
Robert Hailey

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Can we implement Bloom filter sharing quickly???

Reply via email to