Currently, the splitfile insert code has two modes: 1. EarlyEncode on: Generate all the keys as soon as possible. Report the top key to the user as soon as possible, and insert the top of the splitfile just as any other blocks i.e. it could be very early on. 2. EarlyEncode off: Generate the block keys for a layer of the splitfile pyramid only after we have inserted 2/3rds of the blocks in that layer, i.e. after it would be fetchable. (Hence the FCP message PutFetchable, which means the whole file is fetchable).
The former is discouraged because it is slow: many blocks will need to be encoded twice. It is also insecure. However, the latter is also insecure! If we are on opennet, or if the attacker is able to find and connect selectively to nodes in specific keyspace areas, but he is not able to connect to every node at once, his best strategy is a global key-based search attack. This has been known ever since Freenet started in various forms, but basically, if you assume routing works you can deduce roughly where on the keyspace the requestor/insertor is. So you connect to nodes in that area of the keyspace, and attempt to get closer to him. Eventually, provided that he continues to make correlatable requests, you can find him. There is a page on the wiki about this, I apologize to the people who've suggested this who I haven't cited, I took the explanation from a recent Frost post: http://wiki.freenetproject.org/KeySearchAttack The thing is, this attack is most powerful if the attacker can not only observe incoming requests, but move closer to the target by connecting to nodes near to its predicted location, so as to get more requests, more information, and connect to actual nodes rather than just thinking about locations. Therefore, not all requests or inserts are equally vulnerable: If the attacker can predict a large number of keys to be inserted or requested by the target, he can approach the target very rapidly. However, if he cannot predict the keys, and can only inspect logfiles after the event, he can slowly narrow down the keyspace where the requestor might be, but he may be a long way away and not receive many requests, so he will make very slow progress unless he can connect to a significant fraction of the network simultaneously. So, for example: - Anonymous Frost posts are less vulnerable than signed Frost posts. It depends on with how much confidence you can identify the poster of a specific post as being the same person as posted other messages. - Splitfile requests of all kinds are very dangerous. The data is already there, it is presumably published and known about. But inserts are far more interesting for most attackers... - Predictable splitfile reinserts are very dangerous. - Predictable splitfile inserts (announcing in advance e.g. I'm going to insert my copy of <blah>) are dangerous: the attacker can predict the data to be inserted and therefore the keys to be inserted. - Unpredictable splitfile inserts cannot be attacked efficiently, *unless the key is known before the insert is completed*. Even if the splitfile is inserted to an SSK owned by the poster, as long as he inserts the splitfile first, and gives no obvious indication of what file he is going to insert in advance, it will not be possible for the attacker to adaptively attack as described above over the course of the splitfile insert. So lets consider splitfiles: If the user uploads the key to a Frost board after the insert, and the attacker could not predict the file to be inserted, the attacker cannot make much progress. However, if the user uploads the key before the insert is complete (either because he used EarlyEncode and uploaded it *really* early, or because he uploaded it when the file was fetchable, without EarlyEncode), the attacker can identify which requests belong to that insert, and therefore make rapid progress in finding the user. Another example: If the user uploads a large freesite, or a freesite containing a large file, the most predictable part is the SSK containing the top block. The attacker polls for (or subscribes to) that block, and if it is inserted before the rest of the file, he can again make rapid progress. Right now, the default is to insert the splitfile pyramid from the bottom, each layer after the previous layer becomes fetchable i.e. after 2/3rds of it has been inserted. Which as you can see is bad! EarlyEncode makes it worse though, by creating the upper layers as soon as possible (although it doesn't prioritise their insertion, the top block is likely to be inserted before a lot of the rest of the file). A library of all likely files is going to be very useful to a good attacker... We should also seriously consider encrypting each splitfile with a random overall key. I know we'd lose the CHK-collision effect, but it would make it impossible for an attacker to predict the blocks to be uploaded. On the other hand, the efficiency loss would be quite considerable... Hmmm... Of course, the attacker can still do longer-term attacks based on e.g. the Frost identity posting the files, the freesite or Thaw index on which they are posted, over a period of time. One solution is to use no persistent identity or frequently change it; whether this is viable long-term given spam is an open question. Another fair point is what is the impact of this on premix routing? Premix routing on a darknet obscures the insertor behind a premix cell of 100 or so nodes each of which could have been the insertor. It prevents effective correlation attacks on your neighbours. Lets assume we have a large darknet, and we have a serious attacker, who is not strong enough to compromise a large fraction of the darknet (e.g. because of fear of discovery). He *is* strong enough to choose a direction based on incoming requests, and then compromise some nodes in that rough direction. If he can identify the keys that he is interested in, he can get an idea of which way to go, and progressively move closer to the target. Even with premix routing, he should be able to identify the cell which originated the request. If he is confident enough, and the cell is small enough, he can then bust all the nodes in that cell. So it is essential that premix cells be as large as possible, and that as little information is leaked as possible during inserts. However we need to know the topology if we want to keep strict darknet, so there is likely to be a practical limit on the size of a cell, and worse, the larger the cell the larger the number of hops taken to traverse a path between 3 nodes within it. A cell needs to be fairly strongly connected to function, so it is likely that all nodes within a cell will be approximately the same location. Is there any more universal solution? Randomising the first few hops would not help much against a local/nearby attacker, hence the need for premix routing, but it might help to obscure things against an attacker a long way away? Once a request leaves premix routing, it could be routed randomly for a few hops, or routed to a random location. If an attacker intercepts it while it is still being randomly routed, he knows the originating cell is nearby, but if not, it will be coming from the wrong part of the keyspace. There will probably still be a bias towards the originator's real location, but it should be weaker... Some attack simulations would be really great. There was a paper posted here a while ago on this topic.
pgpqxU4B0PoT7.pgp
Description: PGP signature
_______________________________________________ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl