On May 1, 2009, at 9:35 AM, Matthew Toseland wrote:

> THE NUMBERS:
> The salted-hash datastore uses by default 1/2048th of the store size  
> for the
> Bloom filters. This works out to a 23-element filter with 2-bit  
> counting,
> i.e. 46 bits per key in the store (ssk, chk, pubkey, store or  
> cache). The
> easiest thing would be to reduce this to a 1-bit filter, cutting its  
> size in
> half. Also we don't need the pubkey store. So the filter for our  
> peers would
> be approximately 1/6000th of the size of our datastore. This gives  
> 18MB per
> 100GB datastore size. We could halve this again but it would mean  
> that the
> average failing request is diverted to nodes which might have the  
> data but
> don't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005%  
> false
> positives); each time the request has to wait. So if we have 20  
> peers each
> with a 500GB store, that gives 1.8GB of Bloom filters! On the other  
> hand if
> we have 20 peers each with a 10GB store, that gives 36MB of Bloom  
> filters.
> Clearly the latter is acceptable; the former probably isn't for most  
> nodes.
> Also note that for acceptable performance it is essential that the  
> Bloom
> filters for all our online peers can fit into RAM. What we are going  
> to have
> to do is set an upper limit on the total memory we can use for Bloom  
> filters,
> and drop the biggest filters until we reach the target.

Am I missing something? That's not what I get, referencing...
http://en.wikipedia.org/wiki/Bloom_filter

"In theory, an optimal data structure equivalent to a counting Bloom  
filter should not use more space than a static Bloom filter."

We have...
10 GB store => 333k keys (elements to be put in filter; "n" in the  
article)
The article says 9.6 bits/element (with good hash functions) will  
yield 1% error rate.

9.6*333*10^3 = 3,196,800 = 3 Mbits (filter size; "m" in the article)  
~=~ 390kB

n=333,333
m=3,196,800

That only leaves 'k' (the number of bits set per entry); "Classic  
Bloom filters use 1.44log2(1 / ?) bits of space per inserted key,  
where ? is the false positive rate"

So then your 0.005% yields ~14 bits set per key.
log2( 1/(0.005*0.01) ) * 1.44 ~= 14
Or @ 1% => 10 bits set per key

Some questions though...

It still seems to me that all this is moving away from darknet  
routing, and we will be sending our peers 'updates' on the new things  
we've acquired in our data store... Is this adding an optimization to  
the routing algorithim or replacing it?

Are you recommending that we use variable sizes of bloom filter (based  
on data store size)?
It seems to me that a node could have enough logic to determine if a  
static-sized bloom filter is 'too full' (if >50%,75% bits are set at  
update time), and not count it (node has too big a datastore for bloom  
functionality).

For that matter, could a peer manipulate its advertised filter to get  
more requests and thus monitor activity? (e.g. full-set/all-1's)

--
Robert Hailey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090501/5e005015/attachment.html>

Reply via email to