[freenet-dev] What sort of tunnels?

Matthew Toseland Mon, 07 Jan 2013 12:35:02 -0800

On Monday 07 Jan 2013 19:39:40 Matthew Toseland wrote:
> On Monday 07 Jan 2013 18:11:14 Matthew Toseland wrote:
> > On Saturday 29 Dec 2012 14:53:51 Arne Babenhauserheide wrote:
> > > Am Samstag, 22. Dezember 2012, 02:17:24 schrieb Matthew Toseland:
> > > > Hmmm, what about DoS attacks against the pre-insert cache? Maybe we 
> > > > should
> > > > use the store; we do want to free up the slots when we've inserted
> > > > successfully though (in spite of random replacement this still makes the
> > > > store more efficient).
> > > 
> > > wouldn’t the pre-insert keys never be requested, so they would drop out 
> > > anyway 
> > > after 14 days on average? (following digger3’s stats)
> > 
> > Yes. I didn't think it was that bad though. :|
> > > 
> > > > > I insert many chunks which go to the same part of the keyspace. When I
> > > > > send
> > > > > the reveal, I can start a distributed replacement attack.
> > > > > 
> > > > > Any chance to avoid that?
> > > > 
> > > > Ahhh, you mean you can bombard one keyspace location with lots of spam
> > > > inserts, in the hopes of driving a single document out, and while 
> > > > avoiding
> > > > the usual load management limits between you and the target?
> > > 
> > > Yes, exactly that.
> > > 
> > > It allows attacking a given location without being detectable.
> > > 
> > > Though that’s essentially a condition to being anonymous, so I don’t know 
> > > if 
> > > it is fixable…
> > > 
> > > Essentially you can pretend to be many different people at the same time. 
> > > If 
> > > you can orchestrate the reveal, you could create very many inserts to 
> > > some 
> > > location coming in one fast burst. And that could make it possible to 
> > > kill a 
> > > document with known content before anyone can download it.
> > 
> > The attack isn't instant. You have to prepare it in advance and cannot 
> > retarget it. All that you get from the random routing is avoiding the 
> > bottlenecks between one node and the target location.
> > 
> > And yes, we will probably need random routing, for a whole set of reasons. 
> > Even with classic mixnet tunnels there will be multiple tunnels for an 
> > insert, and nothing to stop an attacker using more tunnels if he wants to.
> > > 
> > > > I guess it comes down to how effective will the attack be, given the
> > > > structure of our datastores? How many keys on average do we need to 
> > > > send to
> > > > a node to displace a target document, given they are close to the target
> > > > location so will be routed to the target node, and given we don't know 
> > > > the
> > > > salt value for that store? Won't that be a significant fraction of the
> > > > datastore?
> > > 
> > > That’s a very good point.
> > > 
> > > If we want to be reasonable sure that we can kill a given file, we might 
> > > have 
> > > to displace 90% of the store or so.
> > 
> > Right. So we can simply impose a minimum store size of say 5GB, and this 
> > attack gets pretty hard.
> > > 
> > > I think that the assumption, that normal insert behaviour is pretty 
> > > random 
> > > distributed in the keyspace, so a sudden surge of inserts to a given 
> > > location 
> > > could be detected as an attack.
> > 
> > I'm not sure how easy it would be to detect.
> > > 
> > > And since that would very likely only happen with an attack, adding 
> > > limits to 
> > > displacement in the store should kill those attacks but only marginally 
> > > affect 
> > > regular inserts.
> > > 
> > > In my 1GiB store, I have a write rate of 0.08/s and 13,414 keys, so if I 
> > > interpret the data correctly, it takes 40h till the number of write 
> > > requests 
> > > equals the number of keys. 
> > 
> > We have both the store and the cache. Both of these are in practice mostly 
> > specialised around our location.
> > 
> > The store only gets written on inserts, and only when we're the sink. So 
> > remotely overwriting it from far away on the network may be rather 
> > difficult, unless they already know our location and peers - which they 
> > might. Conceivably we could detect an attempt to overwrite the *store* 
> > simply because the attacker would have to customise the keys' locations to 
> > ensure they are stored. It's not so easy with the cache but I guess it 
> > doesn't matter as much with the cache? Or they could keep a plausible 
> > distribution of locations / low hit rate, and require far more keys to 
> > overwrite everything...
> > > 
> > > To estimate the amount of keys displaced after 40h:
> > > 
> > > N: Number of keys in the store.
> > > First key: 100% chance of displacing an old key.
> > > Second: (1 - 1/N) chance of displacing an old key.
> > > …
> > > since I stumbled with the math, I realized it in a recursive function:
> > > 
> > > def displacedold(n, steps, step=0, nth=None):
> > >      if step == steps: 
> > >          return nth
> > >      if step == 0: 
> > >          return displacedold(n, steps, step=step+1, nth=1 - 1.0/n)
> > >      return displacedold(n, steps, step=step+1, nth=nth*(1 - 1.0/n))
> > > 
> > > def displacedourkey(n, steps):
> > >     dis = 1.0/n
> > >     for step in range(steps-1):
> > >         dis += displacedold(n, step+1)/n
> > >     return dis
> > > 
> > > print displacedourkey(100, 100)
> > > → 0.63
> > > print displacedourkey(100, 200)
> > > → 0.86
> > > print displacedourkey(100, 300)
> > > → 0.95
> > > print displacedourkey(100, 400)
> > > → 0.98
> > > 
> > > This stays similar for bigger numbers of keys, but my algo is bad enough 
> > > that 
> > > it quickly exceeds the maximum recursion depths :) 
> > > 
> > > So to be 95% sure that you replaced a key, you have to create inserts of 
> > > 3 
> > > times the size of the store. Which in case of my store would take about 6 
> > > days.
> > > 
> > > If you could triple the rate of writes to my store with the distributed 
> > > insert 
> > > attack, you could decrease the lifetime to 2 days.
> > > 
> > > But that could be stopped by throttling the rate of change of write 
> > > traffic to 
> > > the node (if that’s possible). Then an attacker would have to keep 
> > > inserting 
> > > constantly and the attack would be twarted.
> > > 
> > > The attack draws its strength from being able to insert first and then 
> > > using 
> > > the reveal step to force a huge number of writes to a given location in a 
> > > very 
> > > short time. Essentially it allows storing up bandwidth and then releasing 
> > > it 
> > > in a much shorter timespan.
> > 
> > Yep.
> > > 
> > > And that is not possible if you have to keep up the insert traffic over a 
> > > longer timespan.
> > > 
> > > And that can be stopped by throttling the rate of change of write traffic 
> > > to 
> > > the store. For example it would be possible to limit the writes per 
> > > minute to 
> > > 3× the daily rate.
> > 
> > Good point. If you have to maintain traffic over a long period, you are 
> > better off doing regular inserts. Of course, the other thing to take into 
> > account is that the capacity of the node is limited, so many of the inserts 
> > will be redirected away from the originally targeted node.
> > 
> > If you are not targeting a specific, single node, you will need to use a 
> > lot more bandwidth, because it's hard to target the node exactly if you 
> > don't even know what it's location is; so most of your inserts won't be 
> > stored. However, it doesn't matter as much if your inserts are redirected, 
> > since you're targeting a group of nodes, a location. But it will require a 
> > *lot* more bandwidth.
> > 
> > If you are targeting a specific, single node, on opennet you can surround 
> > it (maybe indirectly, e.g. with a layer of innocent nodes), although we're 
> > working on ways to reduce that.
> > > 
> > > Then displacing a key in my store with high probability (95%) would take 
> > > at 
> > > least 2 days. And that is much longer than the time needed for completing 
> > > an 
> > > upload, announcing the upload on Sone and someone downloading it.
> > > 
> > > And as soon as the upload has been downloaded, it is in the cache and the 
> > > attack is useless, because downloaders can heal the inserted key.
> > 
> > I'm not sure I follow your scenario here. It takes longer to displace a key 
> > than to reinsert it to a different key?
> > > 
> > > (please doublecheck my logic, though!)
> > > 
> > > > If we can quantify it, e.g. say that it's possible but you'd need to
> > > > continually commit a significant amount of bandwidth per targeted key
> > > > depending on its popularity, that's enough for me ...
> > > 
> > > Throttling the write rate gives that.
> > > 
> > > The attack tries to make all nodes around you send you inserts all of a 
> > > sudden, so the write rate has a massive spike. Smooth that spike and the 
> > > attack is mostly useless.
> > 
> > So if our store rate suddenly spikes, we route the inserts anyway, but 
> > randomly refuse to store? We can't selectively reject inserts because 
> > backoff doesn't care whether it's an insert or a request... I'm not sure 
> > whether exploiting such a countermeasure would be a cheap way to DoS 
> > requests by causing us to backoff?
> > > 
> > > > > > chain are still connected it will prove to them that the insert has 
> > > > > > been
> > > > > 
> > > > > How do they prove it?
> > > > 
> > > > Forwarding a portion of the reveal request? Not all of it, we don't want
> > > > them to be able to decode the block; X_n is probably enough?
> > > 
> > > I don’t know it well enough to say something about that. But I trust your 
> > > judgement here.
> > > 
> > > > > http://en.wikipedia.org/wiki/Dining_cryptographers_problem
> > > > 
> > > > Right, we can use provably anonymous broadcast groups provided the 
> > > > message
> > > > is very small, and provided that it can be announced all at once, before
> > > > there is any churn in the membership of the group. The problem comes if 
> > > > we
> > > > can't do the reveal for the whole file in one broadcast, or, if we do 
> > > > two
> > > > stages, if a node receives both the original and the reveal so can 
> > > > connect
> > > > the two broadcast groups. These sorts of things look very hard to 
> > > > execute
> > > > though IMHO?
> > > 
> > > I don’t know enough to answer that.
> > > 
> > > > previous, so there should be very few such levels. And we can use 
> > > > anonymous
> > > > broadcast for the bottom layer.
> > > 
> > > I’m a bit wary of broadcast, since that creates very many messages. Is it 
> > > a 
> > > real broadcast (reaching all nodes in the network) or limited somehow?
> > 
> > I said "broadcast groups". We'd set up groups of nodes and use DC anonymous 
> > broadcast within a group. The groups would have a limited, fixed number of 
> > participants. This is very inefficient, so could only be used for 
> > broadcasting K and X, or the top level of a multi-level reveal structure. 
> > Also it has a smaller anonymity set than using tunnels. 
> > > 
> > > But all in all, the scheme sounds pretty good to me. I don’t have 
> > > sufficient 
> > > knowledge to really judge it completely, though.
> > 
> > Thanks!
> > 
> > My main concern is how does this compare to a more traditional tunneling 
> > approach, given the reliability issues on Freenet, the fact that on darknet 
> > one hop of a tunnel will be many actual hops and therefore will often be 
> > very slow (bottlenecked by the slowest node). On opennet conceivably we 
> > could use direct connections for tunnels, like I2P and Tor do, but bear in 
> > mind that opennet has other problems and we don't want to encourage its 
> > usage too much; an attacker can dominate the keyspace/topology and hence 
> > control most tunnels, for example. I.e. we can kill Sybil on darknet, but 
> > not on opennet.
> > 
> > We have several options:
> > 1. Rendezvous tunnels. Effectively only a single hop tunnel. Probably the 
> > cheapest, simplest option, but security is mediocre.
> > 2. Tunnels like I2P or Tor. Essentially real-time. Use many tunnels and 
> > create new ones constantly throughout an insert; attacker can try to 
> > observe tunnel creation after he's seen the first one.
> > 3. Tunnels like mixmaster. Store-and-forward, but not reliable, because of 
> > node uptime issues. Limit unreliability by having many backup routes at 
> > each point (which means the attacker is more likely to be one of them; 
> > these would probably have to be around a single location to limit extra 
> > routing?). And we'd still need to use multiple tunnels, though not as many. 
> > Chunk size may be limited by node uptime. We may need additional redundancy.
> > 4. Preinsert/reveal. Expensive, real-time tunnels for the reveal's. 
> > Preinserts travel twice (or 3x) the usual (storage) number of hops. May 
> > need additional redundancy, probably not as much as #3.
> > 
> > #3 and #4 only work for inserts. #1 and #2 could work for downloads too. 
> > It's probably worth implementing more than one strategy, so we have really 
> > good protection for bulk inserts, and adequate protection for downloads. 
> > However, in general, securing downloads is harder, and tolerance of 
> > slowness for downloads is likely less than for inserts.
> > 
> > #3 is probably safer than #2.
> > 
> > #4 is original, not well-studied, but it looks to me like it is comparable 
> > to #3 for security.
> > 
> > UI issues:
> > - #1, #2 no UI issues
> > - #3 significant UI issues: Takes time to finish the insert on 
> > store-and-forward, and it's probably best not to have any feedback?
> > - #4: Upload progress, then reveal is very fast. May or may not be feedback 
> > but should be done quickly after preinsert upload phase finishes so not a 
> > big deal.
> > 
> > PERFORMANCE DEPENDS ON NUMBER OF HOPS:
> > 
> > Preinsert/reveal needs to use twice the number of hops used for normal 
> > storage. This may well be too big; currently it's ~ 20. It's bigger than it 
> > needs to be partly because of wanting to store data in more than one place.
> > 
> > Other forms of tunnels use, approximately, multiples of the network 
> > diameter. Which could be less than the number of hops for storage. Lets say 
> > 8 or 10.
> > 
> > So conceivably #3 could be faster than #4 because data travels fewer hops. 
> > 
> > In an ideal world we'd combine #3 and #4 so we have many tunnels, all of 
> > which are totally useless to an attacker (unless they are happy to trace 
> > every tunnel!), then one reveal. But that'd be rather slow! (Other problem 
> > is how do we know when to post the reveal? Hence it's not really fire and 
> > forget, which ideally we'd like to be true of #3; and anything that 
> > distributes the reveal will allow an attacker to identify the tunnels).
> > 
> #3 is looking complex actually.
> 
> Reliability issues: What happens if we are transferring an onion and the node 
> we are sending to goes down? We send it elsewhere. What happens if we are 
> receiving an onion and the sender goes down? We assume they will send it 
> elsewhere. What if both go down? Clearly this is not going to be 100% 
> reliable. Multi-location keys may help, but make the tunnels easily 
> identified; see below.
> 
> How do we notify the original inserter that the insert is completed? Ideally, 
> we'd like to preserve the current form of inserts: Insert the unpredictable 
> data blocks, then insert the top block; the original inserter subscribes to 
> the top block, along with everyone else. (E.g. for a bookmarked freesite). 
> But that means we need a way to ensure that the top block (or top block plus 
> announcement - small number of predictable keys) is inserted *last*.
> 
> We can do it on the network and make #3 true fire-and-forget:
> - Choose a group of nodes X for the completion.
> - Insert through N tunnels, as many as needed for performance and 
> reliability. Each one contains a separate onion routing to X. When each 
> tunneled insert group finishes, it uses the onion to send a message, 
> including its own identity, and a secret provided by the original inserter.
> - When all (or enough) of the inserts have reached X, they combine, using a 
> shared secret scheme as well as encryption to X.
> - X (whichever one of the group is handling the completion) does something:
> -- Insert the top block (or a small set of blocks).
> -- Start a reveal protocol. If the lower blocks are predictable, it would be 
> nice to have the option to do preinsert/reveal. But we still want 
> fire-and-forget.
> - ORIGINAL INSERTER:
> -- Subscribe to the key. Just like everyone else who is waiting for it.
> -- Fetch the data. Just like everyone else waiting for it. (Timing issues 
> possibly, depending on what the key is, e.g. is manual intervention needed?). 
> Maybe through a tunnel, again depending on popularity and protocol.
> -- If the data is incomplete, identify which sub-onion-insert is responsible. 
> Maybe we can avoid them in future. Then:
> --- Partial reinsert, tunneled, possibly with blind/reveal, OR
> --- Reinsert from scratch.
> -- Second option is likely significantly safer. But if the fetch is tunneled 
> the first option may be acceptable.
> -- We could have the original inserter insert the top block, after testing, 
> then the rendezvous is just to notify them that they can. But it's probably 
> safer to avoid this. Tradeoff?
> ADVANTAGES:
> - True fire and forget.
> - Tunnels aren't perfect, and we're using many of them. So take advantage of 
> our existing security; nobody can identify tunnels until it's finished.
> - If blocks are unpredictable below the top, insert the top block only after 
> the rest are in.
> - If blocks are predictable, we can still do preinsert/reveal, in spite of 
> having tunneled everything.
> 
> EXPENSIVE POSSIBILITIES: (SPECULATIVE!)
> - Healing. FEC should happen before decryption IMHO, for e.g. backups. So it 
> could conceivably be done on the network.
> - Test the lower level inserts by fetching a few random blocks (list provided 
> by the originator, not known to the sub-inserters), before completing insert.
> - Interesting, but probably not realistic, at least not without some means of 
> payment. On darknet, that might be possible with some simple Ripple-like hop 
> by hop accounting. Using a true currency such as bitcoin is IMHO too 
> politically/legally risky, as well as impractical. Also IMHO making such 
> things cost-based is risky security-wise; who's gonna run stuff at a loss 
> just to see the traffic? The CIA of course! Never pay for anonymising 
> services unless they are provably secure, e.g. coin blinding.
> - Plus, how do we know we can we trust the group rendezvous point? Less of a 
> problem if it's only doing a single insert.
> 
> Something closely related:
> 
> Dead man's switch protocol:
> - Generate dead man certificates.
> - Onion route them to various nodes on the network.
> - The nodes receiving them will, after a specified delay, probe a specific 
> set of keys (different each time), and if no outcome, will either insert a 
> key or start an onion route.
> - The latter triggers a combination like #3 feedback above: If enough keys 
> arrive, and if whatever conditions are correct on the recipient (e.g. time), 
> it does something.
> 
> General programmable solution? Maybe. We don't want anything that is 
> over-reliant on a single node. So no ssh, no SVN, etc. But a lot of stuff can 
> be built from small, strictly limited programs running on a handful of nodes. 
> What distinguishes Freenet from Tor and I2P:
> - Freenet is a storage network.
> - Most of its traffic is non-realtime. (Not strictly true but that's the 
> intention)
> - Freenet does not rely on any single node.
> - Freenet does not try to, or support, relying on any single node.
> 
> Preinsert tweaks:
> - Do we want to be able to probe and partially renew a preinsert? Possibly 
> without fetching it i.e. only the originator can do so.
> - Or should a preinsert be usable *only* by revealing it?
> - Do we want to be able to explicitly delete a preinsert? Since it's not a 
> real insert anyway? Of course this is unreliable ...
> - And/or give it an expiration date? This at least is probably a good idea, 
> if we can implement it efficiently.
> 
> 
> 
> CONCLUSIONS?
> 
> We probably want all four options. And that's a lot of work! If that's true, 
> the question is what is most urgent. I guess the answer to that is #1: 
> rendezvous tunnels???
> 
AFAICS the priority order is:
- Basic: random route on high HTL on inserts.
- Fast but poor: Rendezvous tunnels? Use for inserts by default?
- More advanced: Some sort of expensive source-routed tunneling for top blocks 
/ chat posts. Only used to insert top block, one or two keys at once. Does not 
work (well/at all/safely) for whole big inserts, reinserts.
- Reusable realtime tunneling for requests and realtime inserts.
- Offline fire-and-forget, store-and-forward tunneling for secure bulk 
inserts/reinserts.
- Distributed completion for bulk inserts.
- Possibly preinsert/reveal as well. With or without tunneling.

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] What sort of tunnels?

Reply via email to