If an average node has a store size of 20GB (10GB cache) and adds 0.24 keys per second, it will add a full cache worth of keys in 713 hours. Salted hash means that it does not wipe the store in that period (not being LRU); a salted hash store would normally have a half-life for un-wiped keys, but quadratic probing makes this a bit easier.
In any case, it takes a *long time*: 29 days. We can maybe halve the figure if we assume almost all new keys are CHKs. But that's still around 14 days. And it's more on a larger store. So the question is, how practical is it for a mobile attacker to identify a block in its neighbours' stores (from the bloom filter), get connected to the neighbour's neighbours, download their bloom filters, and trace an insert or request back to source one hop at a time? We will likely have requirements for uptime before we start transferring the bloom filter to a node. Assuming the attacker is not already connected, this could add a significant delay: we need to transfer the filter, limited by the node connected to's upstream and what it is prepared to use of it, and it shouldn't offer the bloom filter until we have had some amount of uptime, maybe 1 week? This adds a delay of approx 1 week per hop. In some cases we will have so few hops that this won't solve the problem; in other cases a node will have a small cache (or a huge one) so the trail for one key will go cold, although the attacker could try connecting to all its peers speculatively, or he could look for the other keys. And assuming we don't cache at max HTL, it will only get you so far. But if you did follow the right breadcrumb trail, it should be possible to look for the other keys in nearby nodes, walk around the circle surrounding the originator, and thus maybe figure out which node was the originator, or at least narrow it down to a small group. With the current plan, we don't store a block until we are at least 2 hops away from the originator, with an average of 4 hops due to the probabilistic decrement of the HTL from maximum (with probability 50%). Whether to decrement is determined once on each node, although it is different with backtracking, so it is possible that it might be cached on a node immediately adjacent if there was lots of backtracking. If there is no backtracking, half of the originator's peers will have dropped the HTL, resulting in their peers caching the data; so it is surely possible to trace it, but it will probably require probing a largish number of nodes, probably hundreds, some dependant on the results of other nodes and some not; bandwidth is important to determine capacity here, but by this point the attacker should have a whole bunch of keys on the node. If he traced right back to the second hop (due to backtracking), he just needs the blooms for the nodes connected to the target. Since he doesn't know which is the target, that means a 2 hop radius - but there is likely to be overlap, so it may not be that many nodes. If he traced back to the third hop (the minimum without backtracking), he has a similar problem to determine the second hop, since we decremented on the second and on the third hops. Once he has found the second hop he can find other second hops and hence the common first hop by a similar means, since 50% of the second hops decremented. And to identify the originator, he just needs to find lots of first hops. The more connections he can have open at once, the easier this is going to be, this is limited by his bandwidth; in terms of timing, it is probably possible to avoid this taking the equivalent of many more hops, given reasonable bandwidth (it's a fair bet that the attacker has way more bandwidth than the average node e.g. because he hired a colo). Is this worse than the current situation? Assume an attacker is trying to trace the originator of a large file, after the fact i.e. they didn't intercept the insert. They can ask each peer for blocks from the file. Timing will tell them which blocks were already cached locally. Some will answer even though they didn't cache it locally; this complicates matters by propagating those blocks a few hops. So connect to the nodes which had cached blocks locally already, and request those blocks from their peers, and maybe a few blocks we haven't tried yet. So the principle is very similar. Once we get close to the target, things are a bit easier than with the current plans, because all nodes cached the data, rather than having to guess at the second hops which didn't, and because the originator cached ALL of the data. But further out, Bloom filters do have some advantages to an attacker: Once we have fetched the bloom filters, we can identify all of the blocks that a peer had, without having to wait for load-limited probe requests, but on the other hand we have to wait for the transfer to finish and/or the uptime threshold. Transferring bloom filters in pieces may help an attacker, but only if there are very many blocks to be found. One difficulty is where to start: if we assume we are tracing after the event, we didn't intercept the original insert, so it's unlikely our peers were on the chain, and hence unlikely they have the data - but it is 20x more likely than we were, so it may be a good starting point nonetheless, especially if we run more than one node. We could in fact combine both attacks: we run a bunch of nodes, we are waiting for a file, we pick up on an hourly update that one of our peers was involved in it, we connect to our peer's peers and try both to intercept more blocks and to get adjacent nodes' bloom filters (or in the current code, to just probe for stuff, but that is slow). OTOH if we weren't running at the time, this introduces more delays, and the longer after the event we start the less likely we will succeed. In conclusion, it is probably a bit easier for an attacker... How to improve security against such attacks? Increasing the uptime required before we will transfer bloom filters is an obvious measure. Reducing the average datastore size might help. :| Making it harder for a mobile attacker to connect to targeted peers is obviously a good thing for beating all these attacks. Starting to cache further away from the originator might help a lot. Probabilistic caching at any distance might help a bit. More average bandwidth will help somewhat. Also we should look at backtracking behaviour - clearly we do need to treat routing to the second choice node differently than routing to the first choice node (decide whether to decrement based on a different criterion), or requests will go to a huge number of hops in some cases. But this isn't too great from a security pov, as you may get a low enough HTL to cache on a node that is adjacent to the originator. A much bigger network would increase the number of hops involved (only log^2 though), and the number of nodes to probe; on the current opennet of a few thousand nodes, a moderately powerful attacker could conceivably connect to everyone and get everyone's bloom filters. And of course tunnels would probably solve the problem, but at a rather massive performance cost - which may be a price worth paying for inserts, although not generally for requests. Caching behaviour could be different for inserts. E.g. cache min 2 avg 4 hops away for requests, min 4 avg 6 hops away for inserts? Or even decrease the decrement probability for inserts, so e.g. min 4 avg 8 hops away for inserts? The wider the circle, the more nodes it contains, and the more uncertain the distance to the centre (due to different decrement probabilities), the more work an attacker has to do to find the centre. Of course, the fact of not caching might give away some information, especially for large cache-at hop numbers compared to the radius of the network... IMHO we should probably go ahead with the current plans, with a slightly greater caching distance (lower cache-at-HTL threshold) for inserts than for requests, and with some sort of uptime threshold for transferring bloom filters. And implement tunnels later on, before 0.8 if we have time (depending on how long it takes to build bloom filters and debug db4o), after it if we don't.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
