On Thursday 24 June 2010 18:28:42 Matthew Toseland wrote:
> On Thursday 24 June 2010 04:05:48 Evan Daniel wrote:
> > On Wed, Jun 23, 2010 at 5:43 PM, Matthew Toseland
> > <t...@amphibian.dyndns.org> wrote:
> > > On Wednesday 23 June 2010 20:33:50 Sich wrote:
> > >> Le 23/06/2010 21:01, Matthew Toseland a écrit :
> > >> > Insert a random, safe key
> > >> > This is much safer than the first option, but the key will be 
> > >> > different every time you or somebody else inserts the key. Use this if 
> > >> > you are the original source of some sensitive data.
> > >> >
> > >> >
> > >> Very interesting for filesharing if we split the file.
> > >> When some chunk are lost, you have only to reinsert those who are
> > >> lost... But then we use much datastore... But it's more secure...
> > >> Loosing datastore space is a big problem no ?
> > >
> > > If some people use the new key and some use the old then it's a problem. 
> > > If everyone uses one or the other it isn't. I guess this is another 
> > > reason to use par files etc (ugh).
> > >
> > > The next round of major changes (probably in 1255) will introduce 
> > > cross-segment redundancy which should improve the reliability of  really 
> > > big files.
> > >
> > > Long term we may have selective reinsert support, but of course that 
> > > would be nearly as unsafe as reinserting the whole file to the same key 
> > > ...
> > >
> > > If you're building a reinsert-on-demand based filesharing system let me 
> > > know if you need any specific functionality...
> > 
> > The obvious intermediate is to reinsert a small portion of a file.
> > The normal case is (and will continue to be) that when a file becomes
> > unretrievable, it's because one or more segments is only a couple
> > blocks short of being retrievable.  If you reinsert say 8 blocks out
> > of each segment (1/32 of the file), you'll be reinserting on average 4
> > unretrievable blocks from each segment.  That should be enough in a
> > lot of cases.  This is probably better than selective reinsert (the
> > attacker doesn't get to choose which blocks you reinsert as easily),
> > though it does mean reinserting more blocks (8 per segment when merely
> > reinserting the correct 3 blocks might suffice).
> > 
> > The simple defense against a mobile opennet attacker that has been
> > proposed before would be particularly well suited to partial
> > randomized reinserts.  The insert comes with a time (randomized per
> > block, to some time a bit before the reinsert started), and is only
> > routed along connections that were established before that time, until
> > it reaches some relatively low HTL (10?).  This prevents the attacker
> > from moving during the insert.  On a large file that takes a long time
> > to insert, this is problematic, because there aren't enough
> > connections that are old enough to route along.  For a partial
> > reinsert, this is less of a concern, simply because it doesn't take as
> > long.
> 
> This is a very good point. What if we can improve on this further?
> 
> By implementing long-term requests, we could have *all* the requests for a 
> splitfile go out *at once*, be routed immediately, and then return the data 
> over a long period. This means that:
> 1) The data needs to be trickled back even if nodes go offline - either via 
> rerouting (but consider carefully how to make this safe, e.g. establishing 
> backup routes at the time, or using a node identifier for the next hop so we 
> can reroute via FOAFs without involving the originator so not giving a data 
> point away), or by waiting for the nodes to come back online.
> 2) Load management needs to be able to deal with the fact that we have 
> thousands of requests in flight. This means it may not work on opennet, 
> because there is no underlying trust; although we could maybe have a 
> reputation system to build up some amount of trust. Trust can be translated 
> into capacity limits.
> 3) The mobile attacker defence holds: If all the requests get routed inside a 
> few minutes, and then return data along fixed paths, the attacker has no 
> chance of moving towards the originator. And this works even for fairly big 
> files, without the overhead of tunnels, for requests and inserts of 
> predictable data.
> 4) Overheads should be reasonable, because we can bundle a large number of 
> requests together efficiently.
> 5) We get "burst" behaviour. If we have a fast connection, the data will be 
> returned fast.
> 6) We get slow-return behaviour. In many cases it will take a long time for 
> the data to trickle back. At each hop it will make sense to send one key at a 
> time, if we happen to have multiple keys fully available.
> 7) The node needs to be able to cope with a large number of requests pending: 
> We can keep the current code for routing them but once we have a route, the 
> requests as well as the transfers need to be threadless.
> 8) We need to be able to specify that we want a fast response on a request 
> and that it should not get queued and trickled through/around offline nodes.
> 9) We need a different way to handle timeouts. At the moment we detect when a 
> node is busted by not getting the data within X period. If the node has a 
> backlog of thousands of blocks then this is clearly not going to work. 
> However we do expect a response eventually. So we need some way to take this 
> into account and identify problems.
> 
> This is an elaboration on a chain of thought that goes way back, thanks 
> particularly to the person who came up with the mobile attacker 
> countermeasure, but we've been pondering passive requests for a long time ... 
> It also combines ideas from CCN's hands off load management and various 
> people who have suggested load management changes over the years e.g. on 
> Frost...
> 
> This looks like a major rewrite, maybe 0.10 era, but it would be worth it, 
> the potential benefits are rather enormous...
> 
> Of course we'd still need tunnels for e.g. Frost posts, stuff where you have 
> multiple completed tasks over a period where the attacker could get a sample 
> from each one.
> 
> PRINCIPLES IN DETAIL:
> 
> 1. We send requests when we have requests. We queue them as little as 
> possible. Of course we have a queue of stuff we want to fetch - we recognise 
> blocks that we want - but in terms of the actual queues of requests to send, 
> we minimise this - stuff is sent immediately or near to immediately.
> 2. We send all the requests for a single bundle at once. We queue other 
> bundles if necessary to fit into our capacity limit. If two bundles are 
> related (e.g. different levels of a splitfile or frost posts), we try to send 
> them close together.
> 3. Any request in a bundle has a starting timestamp, which is the same or 
> very close for all the requests in the bundle. We don't route to any node 
> that has connected since the starting timestamp.
> 4. Any given link (connection between a pair of nodes) has a capacity in 
> outstanding requests. On darknet this is fixed, or depends on the trust level 
> for the connection. On opennet this is fixed but much lower, or depends on 
> some sort of trust/history/reputation algorithm.
> 5. Requests (and inserts) are routed reasonably quickly, but may be queued 
> briefly if necessary to achieve good routing.
> 6. Once a request has reached its destination, where the data is available, 
> it is converted into a pending transfer. We exit the request thread and 
> handle it asynchronously.
> 7. On any link there may be a large number of pending transfers, up to a 
> given capacity. If a request fails, we free up the capacity; hence we keep 
> the number of pending transfers under the limit. As long as we have data to 
> transfer, we transfer it. If we have a full blocks (e.g. if we are the data 
> source), we transfer that first, then look for another one, then look for 
> other blocks etc.
> 8. As long as data is being transferred over a link, and requests are being 
> completed regularly (this is why we try to transfer one block at a time 
> rather than multiplexing everything), we don't timeout any of the transfers. 
> We may have to incorporate age into the error detection somehow.
> 9. As a request reaches each hop, we give it the identifier for a node for an 
> alternative path. This can either be the previous hop or another node which 
> has agreed to be a fallback path, depending on security requirements. The 
> node identifier must be accessible within two hops of the node being routed 
> to. If the predecessor node goes offline, we will try to route to the 
> fallback path. If the fallback path is offline, we check whether we have the 
> persistent flag (which may be disallowed on opennet): If we have the 
> persistent flag we suspend the request until either the predecessor or the 
> fallback comes online, and write it to disk. If we don't have the persistent 
> flag, we store the returning data locally only, and penalise the predecessor 
> if it comes back online. Once we have sent a request we are committed to 
> accepting the returned data, this is essential to prevent probing-based 
> censorship attacks. This is why opennet is a problem with this scheme: If you 
> get any nontrivial trust from announcing, you can announce, connect to nodes, 
> use that capacity to DoS them and then disconnect and talk to another node. 
> However, if everyone has fast links, it could return data very quickly on 
> opennet...
> 10. If our capacity limits are low then we will have to send requests (or 
> inserts) for big files in a series of separate bundles. We warn the user that 
> we cannot guarantee full security, and get a confirmation. Unless seclevel = 
> LOW, in which case we just proceed anyway without asking. One interesting 
> point is that this allows us to quantify the loss of security on opennet, 
> although arguably that will just result in more people getting connections to 
> random strangers out of band, many of whom will turn out to be NSA...
> 
> Of course there are security risks with bursting, in that if an attacker has 
> access to traffic analysis as well as nodes close to the originator, he may 
> be able to tie the two together. But the above is not limited to bursting: 
> The principle is we burst the routing phase and then return the data as fast 
> or as slow as possible. It is compatible with sneakernet, it is compatible 
> with CBR links.
> 
> Ideas? Challenges? Suggestions for how to deal with opennet in this framework?
> 
Backup routes should be established by the node being relayed through, not by 
its predecessor, so as not to give anything away. They should also be 
established lazily, only after some time has elapsed and the data hasn't all 
been transferred, or only if the data is over a certain size.

A -> B -> C -> D

After a while, B chooses a backup node E. It asks it whether it is willing to 
be a backup route for a given amount of data. If E says yes, B gives it the 
bundle ID, and tells A about E. By the small world property there will be a 
quick route from A to E and from C to E. We can probably pre-select to avoid 
cases where this doesn't hold (e.g. based on counting common neighbours then 
ranking and taking the upper half of the table and then choosing randomly from 
that), without giving away too much information by our choice of E.

When B goes down, C broadcasts to all B's neighbours, via the FOAF fabric, a 
one-way function of the bundle ID, as a challenge. E responds with a different 
one-way function of the bundle ID, establishing a connection. This could be 
encrypted, but IMHO that isn't necessary. A also notices that B has gone down 
and sends a message to E, again authenticated by its knowledge of the bundle 
ID. So we now have a short route for the data return.

Another option would be to do it completely on-demand: Both A and C broadcast 
to some depth, say 2 hops (giving a maximum relayed route length of 4 hops). 
Given the overheads of such a broadcast, it would have to be for all the 
bundles going between A and C, but this is okay as timing-wise any attacker 
will discover the correlation anyway; and this can be implemented without C 
discovering A. A computes H_a(ID) and H_b(ID), C computes H_a(ID) and H_c(ID), 
they broadcast their respective hashes. When they meet they are relayed, along 
with a hop count. The shortest hop count wins. Hmmm, we might want to encrypt 
it after all...

If there are lots of backup routes in a route, it may make sense to have some 
voluntary path shortening. I'm not sure how to do this securely, although if in 
the above if A is within range of D things might be interesting... 

Anything that involves A broadcasting results in a predecessor sample however, 
so security-wise it is probably better to go with a prearranged backup route, 
which minimises this: If the fallback path is not activated, E does not know 
anything, and even if it is, there is no possibility of mobile attacker source 
tracing because E was chosen relatively early on, or B chose it late on but 
restricted its choice to nodes online when the request was originally relayed. 
Of course this means that if the online nodes change significantly we can end 
up losing the route completely, in which case we can either wait until nodes 
come back online, or we can compromise security to reroute from A to a node 
that wasn't online when the original request went out.

How does rerouting from A affect mobile attacker source tracing exactly? If an 
attacker is distant, and he receives a sample by being in the path of the 
request, he can try to move towards the origin of the data, but he won't 
receive any more data so cannot accelerate his attack; his success is 
determined simply by the number of nodes he has at the time of the original 
request. However, if we reroute to nodes that were added since the original 
request, he has a nonzero chance of intercepting more data and so being able to 
move towards the originator. So we have 3 options:
1) Don't reroute at all. Use the original path or nothing and hope the 
redundancy is enough.
2) Establish alternate routes in advance. Use them but if they fail give up.
3) Reroute on demand, possibly after using pre-established alternate routes.

We can specify behaviour in the bundle:
FAST: Reroute on demand using limited broadcasts.
SECURE: Establish multiple alternate routes in advance. Use them but if they 
fail give up (or wait for nodes to come back online).
MIXED: Secure then fast.
TRANSIENT: Do not establish alternate routes. Fail or wait when our original 
routes go down.

The catch with backup routes is of course that they tend to be longer than the 
original data return route. It would be nice if we could avoid triggering the 
backup routes until we actually need them: In a popular splitfile for example, 
the redundancy in the returned data may allow us to reconstruct even if many of 
the pathways downstream fail. One way to implement this would be for A to wait 
a while before contacting C. However, C is committed to receiving the data, 
because there is no abort. If an alternative route to A is found, then C will 
have less reason to penalise B if (when!) it reconnects.

Should there be an abort option? The problem is that being able to start a 
transfer and then abort it facilitates censorship attacks. Classically Freenet 
resists censorship because if you fetch a key and take out the node that 
returned it, the data will have been cached downstream.

We should check fairly urgently whether the current code allows for aborting 
transfers all the way down, I think it did but I fixed it.

Another interesting point with all this is it requires quite a lot of disk 
space - the maximum in-flight data for each peer.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to