[freenet-dev] Can we implement Bloom filter sharing quickly???

Matthew Toseland Fri, 1 May 2009 18:09:58 +0100

On Friday 01 May 2009 15:35:22 Matthew Toseland wrote:
> IMHO all the evidence points to Bloom filter sharing as *substantially* 
> improving both speed and data reachability. I wonder if it might not be 
> possible to implement it quickly, either before 0.8, after 0.8-beta1 and 
> before 0.8, or for a quick 0.8.1/0.9? The main issues:
> 
> THE NUMBERS:
> The salted-hash datastore uses by default 1/2048th of the store size for the 
> Bloom filters. This works out to a 23-element filter with 2-bit counting, 
> i.e. 46 bits per key in the store (ssk, chk, pubkey, store or cache). The 
> easiest thing would be to reduce this to a 1-bit filter, cutting its size in 
> half. Also we don't need the pubkey store. So the filter for our peers would 
> be approximately 1/6000th of the size of our datastore. This gives 18MB per 
> 100GB datastore size. We could halve this again but it would mean that the 
> average failing request is diverted to nodes which might have the data but 
> don't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005% false 
> positives); each time the request has to wait. So if we have 20 peers each 
> with a 500GB store, that gives 1.8GB of Bloom filters! On the other hand if 
> we have 20 peers each with a 10GB store, that gives 36MB of Bloom filters. 
> Clearly the latter is acceptable; the former probably isn't for most nodes. 
> Also note that for acceptable performance it is essential that the Bloom 
> filters for all our online peers can fit into RAM. What we are going to have 
> to do is set an upper limit on the total memory we can use for Bloom 
filters, 
> and drop the biggest filters until we reach the target.
> 
> Say we set the bloom limit to 128MB. Then a typical node would have:
> 192MB - default heap limit, for freenet itself
> 25MB - bloom filter for 50GB local datastore
> 128MB - default remote bloom filters limit, allows peers to have 700GB total 
> store size
> 
> For a total of around 345MB plus java overheads. Is this sort of number 
> acceptable?


Updated numbers: the bandwidth overhead:

My node has a 30KB/sec limit and a total of 0.24 cache/store writes per 
second. With some help from evanbd:

0.24 writes/sec * 3600 s/hour * 16 bits per key * 2 keys (1 added, 1 removed) 
keys/write * 0.5 (only half of bits actually change, on average) * 30 (bits 
per sample, naive encoding)  * (1/8) (bits per byte) * 20 (send to all 
peers) / 3600 (to get bytes/sec) = 288 bytes per second

My node has a 500GB datastore, so 700 million keys, hence the 30 bits; 
rounding up to the nearest byte increases this to 307 bytes per second.
Arithmetic encoding cuts it to 139 bytes per second.

This is update traffic. On the other hand, the initial transfer would be 
rather larger - around 80MB.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20090501/545bd544/attachment.pgp>

[freenet-dev] Can we implement Bloom filter sharing quickly???

Reply via email to