On Friday 01 May 2009 15:35:22 Matthew Toseland wrote: > IMHO all the evidence points to Bloom filter sharing as *substantially* > improving both speed and data reachability. I wonder if it might not be > possible to implement it quickly, either before 0.8, after 0.8-beta1 and > before 0.8, or for a quick 0.8.1/0.9? The main issues: > > THE NUMBERS: > The salted-hash datastore uses by default 1/2048th of the store size for the > Bloom filters. This works out to a 23-element filter with 2-bit counting, > i.e. 46 bits per key in the store (ssk, chk, pubkey, store or cache). The > easiest thing would be to reduce this to a 1-bit filter, cutting its size in > half. Also we don't need the pubkey store. So the filter for our peers would > be approximately 1/6000th of the size of our datastore. This gives 18MB per > 100GB datastore size. We could halve this again but it would mean that the > average failing request is diverted to nodes which might have the data but > don't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005% false > positives); each time the request has to wait. So if we have 20 peers each > with a 500GB store, that gives 1.8GB of Bloom filters! On the other hand if > we have 20 peers each with a 10GB store, that gives 36MB of Bloom filters. > Clearly the latter is acceptable; the former probably isn't for most nodes. > Also note that for acceptable performance it is essential that the Bloom > filters for all our online peers can fit into RAM. What we are going to have > to do is set an upper limit on the total memory we can use for Bloom filters, > and drop the biggest filters until we reach the target. > > Say we set the bloom limit to 128MB. Then a typical node would have: > 192MB - default heap limit, for freenet itself > 25MB - bloom filter for 50GB local datastore > 128MB - default remote bloom filters limit, allows peers to have 700GB total > store size > > For a total of around 345MB plus java overheads. Is this sort of number > acceptable?
Updated numbers: the bandwidth overhead: My node has a 30KB/sec limit and a total of 0.24 cache/store writes per second. With some help from evanbd: 0.24 writes/sec * 3600 s/hour * 16 bits per key * 2 keys (1 added, 1 removed) keys/write * 0.5 (only half of bits actually change, on average) * 30 (bits per sample, naive encoding) * (1/8) (bits per byte) * 20 (send to all peers) / 3600 (to get bytes/sec) = 288 bytes per second My node has a 500GB datastore, so 700 million keys, hence the 30 bits; rounding up to the nearest byte increases this to 307 bytes per second. Arithmetic encoding cuts it to 139 bytes per second. This is update traffic. On the other hand, the initial transfer would be rather larger - around 80MB. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090501/545bd544/attachment.pgp>
