On May 1, 2009, at 9:35 AM, Matthew Toseland wrote: > THE NUMBERS: > The salted-hash datastore uses by default 1/2048th of the store size > for the > Bloom filters. This works out to a 23-element filter with 2-bit > counting, > i.e. 46 bits per key in the store (ssk, chk, pubkey, store or > cache). The > easiest thing would be to reduce this to a 1-bit filter, cutting its > size in > half. Also we don't need the pubkey store. So the filter for our > peers would > be approximately 1/6000th of the size of our datastore. This gives > 18MB per > 100GB datastore size. We could halve this again but it would mean > that the > average failing request is diverted to nodes which might have the > data but > don't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005% > false > positives); each time the request has to wait. So if we have 20 > peers each > with a 500GB store, that gives 1.8GB of Bloom filters! On the other > hand if > we have 20 peers each with a 10GB store, that gives 36MB of Bloom > filters. > Clearly the latter is acceptable; the former probably isn't for most > nodes. > Also note that for acceptable performance it is essential that the > Bloom > filters for all our online peers can fit into RAM. What we are going > to have > to do is set an upper limit on the total memory we can use for Bloom > filters, > and drop the biggest filters until we reach the target.
Am I missing something? That's not what I get, referencing... http://en.wikipedia.org/wiki/Bloom_filter "In theory, an optimal data structure equivalent to a counting Bloom filter should not use more space than a static Bloom filter." We have... 10 GB store => 333k keys (elements to be put in filter; "n" in the article) The article says 9.6 bits/element (with good hash functions) will yield 1% error rate. 9.6*333*10^3 = 3,196,800 = 3 Mbits (filter size; "m" in the article) ~=~ 390kB n=333,333 m=3,196,800 That only leaves 'k' (the number of bits set per entry); "Classic Bloom filters use 1.44log2(1 / ?) bits of space per inserted key, where ? is the false positive rate" So then your 0.005% yields ~14 bits set per key. log2( 1/(0.005*0.01) ) * 1.44 ~= 14 Or @ 1% => 10 bits set per key Some questions though... It still seems to me that all this is moving away from darknet routing, and we will be sending our peers 'updates' on the new things we've acquired in our data store... Is this adding an optimization to the routing algorithim or replacing it? Are you recommending that we use variable sizes of bloom filter (based on data store size)? It seems to me that a node could have enough logic to determine if a static-sized bloom filter is 'too full' (if >50%,75% bits are set at update time), and not count it (node has too big a datastore for bloom functionality). For that matter, could a peer manipulate its advertised filter to get more requests and thus monitor activity? (e.g. full-set/all-1's) -- Robert Hailey -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090501/5e005015/attachment.html>
