Currently, the splitfile insert code has two modes:
1. EarlyEncode on: Generate all the keys as soon as possible. Report the top 
key to the user as soon as possible, and insert the top of the splitfile just 
as any other blocks i.e. it could be very early on.
2. EarlyEncode off: Generate the block keys for a layer of the splitfile 
pyramid only after we have inserted 2/3rds of the blocks in that layer, i.e. 
after it would be fetchable. (Hence the FCP message PutFetchable, which means 
the whole file is fetchable).

The former is discouraged because it is slow: many blocks will need to be 
encoded twice. It is also insecure. However, the latter is also insecure!

If we are on opennet, or if the attacker is able to find and connect 
selectively to nodes in specific keyspace areas, but he is not able to 
connect to every node at once, his best strategy is a global key-based search 
attack. This has been known ever since Freenet started in various forms, but 
basically, if you assume routing works you can deduce roughly where on the 
keyspace the requestor/insertor is. So you connect to nodes in that area of 
the keyspace, and attempt to get closer to him. Eventually, provided that he 
continues to make correlatable requests, you can find him.

There is a page on the wiki about this, I apologize to the people who've 
suggested this who I haven't cited, I took the explanation from a recent 
Frost post:
http://wiki.freenetproject.org/KeySearchAttack

The thing is, this attack is most powerful if the attacker can not only 
observe incoming requests, but move closer to the target by connecting to 
nodes near to its predicted location, so as to get more requests, more 
information, and connect to actual nodes rather than just thinking about 
locations.

Therefore, not all requests or inserts are equally vulnerable: If the attacker 
can predict a large number of keys to be inserted or requested by the target, 
he can approach the target very rapidly. However, if he cannot predict the 
keys, and can only inspect logfiles after the event, he can slowly narrow 
down the keyspace where the requestor might be, but he may be a long way away 
and not receive many requests, so he will make very slow progress unless he 
can connect to a significant fraction of the network simultaneously.

So, for example:
- Anonymous Frost posts are less vulnerable than signed Frost posts. It 
depends on with how much confidence you can identify the poster of a specific 
post as being the same person as posted other messages.
- Splitfile requests of all kinds are very dangerous. The data is already 
there, it is presumably published and known about. But inserts are far more 
interesting for most attackers...
- Predictable splitfile reinserts are very dangerous.
- Predictable splitfile inserts (announcing in advance e.g. I'm going to 
insert my copy of <blah>) are dangerous: the attacker can predict the data to 
be inserted and therefore the keys to be inserted.
- Unpredictable splitfile inserts cannot be attacked efficiently, *unless the 
key is known before the insert is completed*. Even if the splitfile is 
inserted to an SSK owned by the poster, as long as he inserts the splitfile 
first, and gives no obvious indication of what file he is going to insert in 
advance, it will not be possible for the attacker to adaptively attack as 
described above over the course of the splitfile insert. 

So lets consider splitfiles:
If the user uploads the key to a Frost board after the insert, and the 
attacker could not predict the file to be inserted, the attacker cannot make 
much progress. However, if the user uploads the key before the insert is 
complete (either because he used EarlyEncode and uploaded it *really* early, 
or because he uploaded it when the file was fetchable, without EarlyEncode), 
the attacker can identify which requests belong to that insert, and therefore 
make rapid progress in finding the user.

Another example: If the user uploads a large freesite, or a freesite 
containing a large file, the most predictable part is the SSK containing the 
top block. The attacker polls for (or subscribes to) that block, and if it is 
inserted before the rest of the file, he can again make rapid progress. Right 
now, the default is to insert the splitfile pyramid from the bottom, each 
layer after the previous layer becomes fetchable i.e. after 2/3rds of it has 
been inserted. Which as you can see is bad! EarlyEncode makes it worse 
though, by creating the upper layers as soon as possible (although it doesn't 
prioritise their insertion, the top block is likely to be inserted before a 
lot of the rest of the file).

A library of all likely files is going to be very useful to a good attacker...

We should also seriously consider encrypting each splitfile with a random 
overall key. I know we'd lose the CHK-collision effect, but it would make it 
impossible for an attacker to predict the blocks to be uploaded. On the other 
hand, the efficiency loss would be quite considerable... Hmmm...

Of course, the attacker can still do longer-term attacks based on e.g. the 
Frost identity posting the files, the freesite or Thaw index on which they 
are posted, over a period of time. One solution is to use no persistent 
identity or frequently change it; whether this is viable long-term given spam 
is an open question.

Another fair point is what is the impact of this on premix routing? Premix 
routing on a darknet obscures the insertor behind a premix cell of 100 or so 
nodes each of which could have been the insertor. It prevents effective 
correlation attacks on your neighbours. Lets assume we have a large darknet, 
and we have a serious attacker, who is not strong enough to compromise a 
large fraction of the darknet (e.g. because of fear of discovery). He *is* 
strong enough to choose a direction based on incoming requests, and then 
compromise some nodes in that rough direction. If he can identify the keys 
that he is interested in, he can get an idea of which way to go, and 
progressively move closer to the target. Even with premix routing, he should 
be able to identify the cell which originated the request. If he is confident 
enough, and the cell is small enough, he can then bust all the nodes in that 
cell. So it is essential that premix cells be as large as possible, and that 
as little information is leaked as possible during inserts. However we need 
to know the topology if we want to keep strict darknet, so there is likely to 
be a practical limit on the size of a cell, and worse, the larger the cell 
the larger the number of hops taken to traverse a path between 3 nodes within 
it. A cell needs to be fairly strongly connected to function, so it is likely 
that all nodes within a cell will be approximately the same location.

Is there any more universal solution? Randomising the first few hops would not 
help much against a local/nearby attacker, hence the need for premix routing, 
but it might help to obscure things against an attacker a long way away? Once 
a request leaves premix routing, it could be routed randomly for a few hops, 
or routed to a random location. If an attacker intercepts it while it is 
still being randomly routed, he knows the originating cell is nearby, but if 
not, it will be coming from the wrong part of the keyspace. There will 
probably still be a bias towards the originator's real location, but it 
should be weaker...

Some attack simulations would be really great. There was a paper posted here a 
while ago on this topic.

Attachment: pgpqxU4B0PoT7.pgp
Description: PGP signature

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to