On Friday 01 May 2009 15:35:22 Matthew Toseland wrote: > IMHO all the evidence points to Bloom filter sharing as *substantially* > improving both speed and data reachability. I wonder if it might not be > possible to implement it quickly, either before 0.8, after 0.8-beta1 and > before 0.8, or for a quick 0.8.1/0.9? The main issues: > > THE NUMBERS: > The salted-hash datastore uses by default 1/2048th of the store size for the > Bloom filters. This works out to a 23-element filter with 2-bit counting, > i.e. 46 bits per key in the store (ssk, chk, pubkey, store or cache). The > easiest thing would be to reduce this to a 1-bit filter, cutting its size in > half. Also we don't need the pubkey store. So the filter for our peers would > be approximately 1/6000th of the size of our datastore. This gives 18MB per > 100GB datastore size. We could halve this again but it would mean that the > average failing request is diverted to nodes which might have the data but > don't 2.5 times (24 hops * 20 peers = 480 bloom checks, vs 0.005% false > positives); each time the request has to wait. So if we have 20 peers each > with a 500GB store, that gives 1.8GB of Bloom filters! On the other hand if > we have 20 peers each with a 10GB store, that gives 36MB of Bloom filters. > Clearly the latter is acceptable; the former probably isn't for most nodes. > Also note that for acceptable performance it is essential that the Bloom > filters for all our online peers can fit into RAM. What we are going to have > to do is set an upper limit on the total memory we can use for Bloom filters, > and drop the biggest filters until we reach the target. > > Say we set the bloom limit to 128MB. Then a typical node would have: > 192MB - default heap limit, for freenet itself > 25MB - bloom filter for 50GB local datastore > 128MB - default remote bloom filters limit, allows peers to have 700GB total > store size > > For a total of around 345MB plus java overheads. Is this sort of number > acceptable? > > SECURING IT: > Giving an index of our datastore to our opennet peers, while storing all > locally requested data in our datastore, would seem suicidal from a security > perspective. We need to not store requests locally. How? > > The best answer atm is to implement encrypted rendezvous-tunnels. IMHO this > will take considerable time to get right, especially given that one tunnel is > highly unlikely to provide good performance. > > But I wonder if a half-way approach might be sufficient for now, enabling us > to not cache locally, while not reducing security overall? > - While the HTL is at maximum, check the datastore but do not cache returned > data, and route randomly rather than according to the provided key. > - We already decrement the HTL from the maximum randomly, and decide this once > per peer. > - We might want to increase the chance of decrementing from its current 25% > (average 4 hops) to say 33% or even higher. > > IMHO this would take well under one day to implement. The question is, does it > provide security *NO WORSE THAN* the current situation? IMHO it does. If we > don't change the decrement probability, then nearby attackers already have > the HTL flag. Plus, random routing and routing to random keys are virtually > identical for purposes of correlation attacks. > > CHURN IMPACT: > The main worry with regards to Bloom filter sharing performance is that it > won't work on opennet. We only want to share our Bloom filters with > long-lived peers, because there is some cost to transferring them in the > first place and in keeping them up to date. This is an unknown variable > really at this point: what proportion of our opennet peers will be long-lived > enough that it is worthwhile sharing our bloom filters with them? > > IMPLEMENTING IT: > Main tasks: > - Converting our datastore indexes into a compact, slightly more lossy, 1-bit > filter. (Easy) > - Creating a snapshot once an hour, and keeping for some period (1 week?). > (Easy) > - Creating an efficient binary diff format for hour-to-hour diffs, and keeping > them. (Moderate) > - Sending our filters to our darknet peers. (Easy) > - Updating our darknet peers once an hour with the diffs we create anyway. > (Easy) > - Recording what version of the indexes we have sent to each darknet peer. > (Easy) > - Updating our darknet peers when they reconnect after downtime with all the > diffs they missed. (Easy) > - Using any filters we have to send GetOfferedKey-like requests for the data, > and handling such requests. (Easy) > - Track success rates for bloom-based fetches. (Optional, can be done later) > (Easy) > - Tracking a large number of opennet peers which we have connected to over the > last week, recording their total uptime over the week. (Moderate) > - Calculating which have sufficient uptime for sharing bloom filters to be > worthwhile. (Easy) > - Sending those nodes the bloom filters, and keeping them up to date, and > recording what version we sent them last. (Easy) > - Limit the total memory usage for Bloom filters, by adding it up and then > dropping the biggest filters from active use until we're not over the limit. > Tell our peers so they don't send us updates. (Moderate)
A suggested optimisation from evanbd: [20:34:30] <evanbd> toad_: I figured out a solution to reducing the bloom filter size for memory management. [20:34:30] <FreenetLogBot> The monitoring script has detected that ftp://ftpmirror.sectoor.de/freenet/ is eligible to be back on the main mirror rotation list. Welcome Back! [20:37:16] <evanbd> You standardize on a size for the bloom filters; say 1MiB. Then, if your store has 100GB of data, and needs 18MiB of bloom filters, you partition the keyspace into 18 chunks of roughly equal population. Each segment of the keyspace then gets its own bloom filter. [20:38:24] <evanbd> Then, if my node has 100MiB of memory to spend on my peers' bloom filters, and 20 peers, I just ask each peer for up to 5 bloom filters. [20:40:11] <toad_> evanbd: it's good to have a partial filter for each node yes [20:40:24] <toad_> evanbd: however, you end up checking more filters which increases false positives [20:40:52] <evanbd> No, the fp rate stays the same. [20:41:18] <evanbd> Suppose your node has 18 filters, each with a 1E-5 fp rate [20:41:36] <evanbd> When I get a request, I compare it to your node's filter set. [20:41:57] <evanbd> But only *one* of those filters gets checked, since each one covers a different portion of the keyspace [20:42:19] <evanbd> So, regardless of how many filters you have, I have a 1E-5 fp rate. [20:42:46] <evanbd> If I don't have all your filters, the fp rate will (on average) be lower, because for some keys I won't have the appropriate filter. [20:44:33] --> FrinkC has joined this channel (n=FrinkC at unaffiliated/frinkc). [20:47:31] <toad_> evanbd: hmmmm ok [20:47:57] --> alfa21 has joined this channel (n=chatzill at 89-186-68-50.dcpool.ip.kpnqwest.it). [20:48:03] <toad_> evanbd: i don't see this making a big difference though, all it means is for nodes with big stores you get some matches instead of no matches [20:48:25] --> JucaBlues has joined this channel (n=felipe at 189.79.78.26). [20:51:07] <-- edt has left this server (Remote closed the connection). [20:51:26] <FreenetLogBot> The monitoring script has detected a network glitch on ftp://ftpmirror.sectoor.de/freenet/ : it has been disabled [20:53:29] <evanbd> It also means that you can send the filters to an opennet peer incrementally, in a way that is useful to it before you have time to send the full 80MB filter. [20:58:51] <toad_> evanbd: that's a good point [21:00:04] <evanbd> I think that makes the decision about when to bother sending a filter to a peer easier, too. You could just trickle it at a slow steady pace; if it disconnects before you finish, that's not a huge waste, since hopefully it's gotten some use out of the partial set. [21:01:10] <evanbd> Anyway, afk for at least a little while. [21:01:51] <toad_> hmmm that's true -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20090501/30da018b/attachment.pgp>
