Re: [freenet-dev] Distributed censorship attack was Re: Really secure inserts?

Matthew Toseland Mon, 07 Jan 2013 11:40:09 -0800

On Monday 07 Jan 2013 18:11:14 Matthew Toseland wrote:
> On Saturday 29 Dec 2012 14:53:51 Arne Babenhauserheide wrote:
> > Am Samstag, 22. Dezember 2012, 02:17:24 schrieb Matthew Toseland:
> > > Hmmm, what about DoS attacks against the pre-insert cache? Maybe we should
> > > use the store; we do want to free up the slots when we've inserted
> > > successfully though (in spite of random replacement this still makes the
> > > store more efficient).
> > 
> > wouldn’t the pre-insert keys never be requested, so they would drop out 
> > anyway 
> > after 14 days on average? (following digger3’s stats)
> 
> Yes. I didn't think it was that bad though. :|
> > 
> > > > I insert many chunks which go to the same part of the keyspace. When I
> > > > send
> > > > the reveal, I can start a distributed replacement attack.
> > > > 
> > > > Any chance to avoid that?
> > > 
> > > Ahhh, you mean you can bombard one keyspace location with lots of spam
> > > inserts, in the hopes of driving a single document out, and while avoiding
> > > the usual load management limits between you and the target?
> > 
> > Yes, exactly that.
> > 
> > It allows attacking a given location without being detectable.
> > 
> > Though that’s essentially a condition to being anonymous, so I don’t know 
> > if 
> > it is fixable…
> > 
> > Essentially you can pretend to be many different people at the same time. 
> > If 
> > you can orchestrate the reveal, you could create very many inserts to some 
> > location coming in one fast burst. And that could make it possible to kill 
> > a 
> > document with known content before anyone can download it.
> 
> The attack isn't instant. You have to prepare it in advance and cannot 
> retarget it. All that you get from the random routing is avoiding the 
> bottlenecks between one node and the target location.
> 
> And yes, we will probably need random routing, for a whole set of reasons. 
> Even with classic mixnet tunnels there will be multiple tunnels for an 
> insert, and nothing to stop an attacker using more tunnels if he wants to.
> > 
> > > I guess it comes down to how effective will the attack be, given the
> > > structure of our datastores? How many keys on average do we need to send 
> > > to
> > > a node to displace a target document, given they are close to the target
> > > location so will be routed to the target node, and given we don't know the
> > > salt value for that store? Won't that be a significant fraction of the
> > > datastore?
> > 
> > That’s a very good point.
> > 
> > If we want to be reasonable sure that we can kill a given file, we might 
> > have 
> > to displace 90% of the store or so.
> 
> Right. So we can simply impose a minimum store size of say 5GB, and this 
> attack gets pretty hard.
> > 
> > I think that the assumption, that normal insert behaviour is pretty random 
> > distributed in the keyspace, so a sudden surge of inserts to a given 
> > location 
> > could be detected as an attack.
> 
> I'm not sure how easy it would be to detect.
> > 
> > And since that would very likely only happen with an attack, adding limits 
> > to 
> > displacement in the store should kill those attacks but only marginally 
> > affect 
> > regular inserts.
> > 
> > In my 1GiB store, I have a write rate of 0.08/s and 13,414 keys, so if I 
> > interpret the data correctly, it takes 40h till the number of write 
> > requests 
> > equals the number of keys. 
> 
> We have both the store and the cache. Both of these are in practice mostly 
> specialised around our location.
> 
> The store only gets written on inserts, and only when we're the sink. So 
> remotely overwriting it from far away on the network may be rather difficult, 
> unless they already know our location and peers - which they might. 
> Conceivably we could detect an attempt to overwrite the *store* simply 
> because the attacker would have to customise the keys' locations to ensure 
> they are stored. It's not so easy with the cache but I guess it doesn't 
> matter as much with the cache? Or they could keep a plausible distribution of 
> locations / low hit rate, and require far more keys to overwrite everything...
> > 
> > To estimate the amount of keys displaced after 40h:
> > 
> > N: Number of keys in the store.
> > First key: 100% chance of displacing an old key.
> > Second: (1 - 1/N) chance of displacing an old key.
> > …
> > since I stumbled with the math, I realized it in a recursive function:
> > 
> > def displacedold(n, steps, step=0, nth=None):
> >      if step == steps: 
> >          return nth
> >      if step == 0: 
> >          return displacedold(n, steps, step=step+1, nth=1 - 1.0/n)
> >      return displacedold(n, steps, step=step+1, nth=nth*(1 - 1.0/n))
> > 
> > def displacedourkey(n, steps):
> >     dis = 1.0/n
> >     for step in range(steps-1):
> >         dis += displacedold(n, step+1)/n
> >     return dis
> > 
> > print displacedourkey(100, 100)
> > → 0.63
> > print displacedourkey(100, 200)
> > → 0.86
> > print displacedourkey(100, 300)
> > → 0.95
> > print displacedourkey(100, 400)
> > → 0.98
> > 
> > This stays similar for bigger numbers of keys, but my algo is bad enough 
> > that 
> > it quickly exceeds the maximum recursion depths :) 
> > 
> > So to be 95% sure that you replaced a key, you have to create inserts of 3 
> > times the size of the store. Which in case of my store would take about 6 
> > days.
> > 
> > If you could triple the rate of writes to my store with the distributed 
> > insert 
> > attack, you could decrease the lifetime to 2 days.
> > 
> > But that could be stopped by throttling the rate of change of write traffic 
> > to 
> > the node (if that’s possible). Then an attacker would have to keep 
> > inserting 
> > constantly and the attack would be twarted.
> > 
> > The attack draws its strength from being able to insert first and then 
> > using 
> > the reveal step to force a huge number of writes to a given location in a 
> > very 
> > short time. Essentially it allows storing up bandwidth and then releasing 
> > it 
> > in a much shorter timespan.
> 
> Yep.
> > 
> > And that is not possible if you have to keep up the insert traffic over a 
> > longer timespan.
> > 
> > And that can be stopped by throttling the rate of change of write traffic 
> > to 
> > the store. For example it would be possible to limit the writes per minute 
> > to 
> > 3× the daily rate.
> 
> Good point. If you have to maintain traffic over a long period, you are 
> better off doing regular inserts. Of course, the other thing to take into 
> account is that the capacity of the node is limited, so many of the inserts 
> will be redirected away from the originally targeted node.
> 
> If you are not targeting a specific, single node, you will need to use a lot 
> more bandwidth, because it's hard to target the node exactly if you don't 
> even know what it's location is; so most of your inserts won't be stored. 
> However, it doesn't matter as much if your inserts are redirected, since 
> you're targeting a group of nodes, a location. But it will require a *lot* 
> more bandwidth.
> 
> If you are targeting a specific, single node, on opennet you can surround it 
> (maybe indirectly, e.g. with a layer of innocent nodes), although we're 
> working on ways to reduce that.
> > 
> > Then displacing a key in my store with high probability (95%) would take at 
> > least 2 days. And that is much longer than the time needed for completing 
> > an 
> > upload, announcing the upload on Sone and someone downloading it.
> > 
> > And as soon as the upload has been downloaded, it is in the cache and the 
> > attack is useless, because downloaders can heal the inserted key.
> 
> I'm not sure I follow your scenario here. It takes longer to displace a key 
> than to reinsert it to a different key?
> > 
> > (please doublecheck my logic, though!)
> > 
> > > If we can quantify it, e.g. say that it's possible but you'd need to
> > > continually commit a significant amount of bandwidth per targeted key
> > > depending on its popularity, that's enough for me ...
> > 
> > Throttling the write rate gives that.
> > 
> > The attack tries to make all nodes around you send you inserts all of a 
> > sudden, so the write rate has a massive spike. Smooth that spike and the 
> > attack is mostly useless.
> 
> So if our store rate suddenly spikes, we route the inserts anyway, but 
> randomly refuse to store? We can't selectively reject inserts because backoff 
> doesn't care whether it's an insert or a request... I'm not sure whether 
> exploiting such a countermeasure would be a cheap way to DoS requests by 
> causing us to backoff?
> > 
> > > > > chain are still connected it will prove to them that the insert has 
> > > > > been
> > > > 
> > > > How do they prove it?
> > > 
> > > Forwarding a portion of the reveal request? Not all of it, we don't want
> > > them to be able to decode the block; X_n is probably enough?
> > 
> > I don’t know it well enough to say something about that. But I trust your 
> > judgement here.
> > 
> > > > http://en.wikipedia.org/wiki/Dining_cryptographers_problem
> > > 
> > > Right, we can use provably anonymous broadcast groups provided the message
> > > is very small, and provided that it can be announced all at once, before
> > > there is any churn in the membership of the group. The problem comes if we
> > > can't do the reveal for the whole file in one broadcast, or, if we do two
> > > stages, if a node receives both the original and the reveal so can connect
> > > the two broadcast groups. These sorts of things look very hard to execute
> > > though IMHO?
> > 
> > I don’t know enough to answer that.
> > 
> > > previous, so there should be very few such levels. And we can use 
> > > anonymous
> > > broadcast for the bottom layer.
> > 
> > I’m a bit wary of broadcast, since that creates very many messages. Is it a 
> > real broadcast (reaching all nodes in the network) or limited somehow?
> 
> I said "broadcast groups". We'd set up groups of nodes and use DC anonymous 
> broadcast within a group. The groups would have a limited, fixed number of 
> participants. This is very inefficient, so could only be used for 
> broadcasting K and X, or the top level of a multi-level reveal structure. 
> Also it has a smaller anonymity set than using tunnels. 
> > 
> > But all in all, the scheme sounds pretty good to me. I don’t have 
> > sufficient 
> > knowledge to really judge it completely, though.
> 
> Thanks!
> 
> My main concern is how does this compare to a more traditional tunneling 
> approach, given the reliability issues on Freenet, the fact that on darknet 
> one hop of a tunnel will be many actual hops and therefore will often be very 
> slow (bottlenecked by the slowest node). On opennet conceivably we could use 
> direct connections for tunnels, like I2P and Tor do, but bear in mind that 
> opennet has other problems and we don't want to encourage its usage too much; 
> an attacker can dominate the keyspace/topology and hence control most 
> tunnels, for example. I.e. we can kill Sybil on darknet, but not on opennet.
> 
> We have several options:
> 1. Rendezvous tunnels. Effectively only a single hop tunnel. Probably the 
> cheapest, simplest option, but security is mediocre.
> 2. Tunnels like I2P or Tor. Essentially real-time. Use many tunnels and 
> create new ones constantly throughout an insert; attacker can try to observe 
> tunnel creation after he's seen the first one.
> 3. Tunnels like mixmaster. Store-and-forward, but not reliable, because of 
> node uptime issues. Limit unreliability by having many backup routes at each 
> point (which means the attacker is more likely to be one of them; these would 
> probably have to be around a single location to limit extra routing?). And 
> we'd still need to use multiple tunnels, though not as many. Chunk size may 
> be limited by node uptime. We may need additional redundancy.
> 4. Preinsert/reveal. Expensive, real-time tunnels for the reveal's. 
> Preinserts travel twice (or 3x) the usual (storage) number of hops. May need 
> additional redundancy, probably not as much as #3.
> 
> #3 and #4 only work for inserts. #1 and #2 could work for downloads too. It's 
> probably worth implementing more than one strategy, so we have really good 
> protection for bulk inserts, and adequate protection for downloads. However, 
> in general, securing downloads is harder, and tolerance of slowness for 
> downloads is likely less than for inserts.
> 
> #3 is probably safer than #2.
> 
> #4 is original, not well-studied, but it looks to me like it is comparable to 
> #3 for security.
> 
> UI issues:
> - #1, #2 no UI issues
> - #3 significant UI issues: Takes time to finish the insert on 
> store-and-forward, and it's probably best not to have any feedback?
> - #4: Upload progress, then reveal is very fast. May or may not be feedback 
> but should be done quickly after preinsert upload phase finishes so not a big 
> deal.
> 
> PERFORMANCE DEPENDS ON NUMBER OF HOPS:
> 
> Preinsert/reveal needs to use twice the number of hops used for normal 
> storage. This may well be too big; currently it's ~ 20. It's bigger than it 
> needs to be partly because of wanting to store data in more than one place.
> 
> Other forms of tunnels use, approximately, multiples of the network diameter. 
> Which could be less than the number of hops for storage. Lets say 8 or 10.
> 
> So conceivably #3 could be faster than #4 because data travels fewer hops. 
> 
> In an ideal world we'd combine #3 and #4 so we have many tunnels, all of 
> which are totally useless to an attacker (unless they are happy to trace 
> every tunnel!), then one reveal. But that'd be rather slow! (Other problem is 
> how do we know when to post the reveal? Hence it's not really fire and 
> forget, which ideally we'd like to be true of #3; and anything that 
> distributes the reveal will allow an attacker to identify the tunnels).
> 
#3 is looking complex actually.


Reliability issues: What happens if we are transferring an onion and the node 
we are sending to goes down? We send it elsewhere. What happens if we are 
receiving an onion and the sender goes down? We assume they will send it 
elsewhere. What if both go down? Clearly this is not going to be 100% reliable. 
Multi-location keys may help, but make the tunnels easily identified; see below.

How do we notify the original inserter that the insert is completed? Ideally, 
we'd like to preserve the current form of inserts: Insert the unpredictable 
data blocks, then insert the top block; the original inserter subscribes to the 
top block, along with everyone else. (E.g. for a bookmarked freesite). But that 
means we need a way to ensure that the top block (or top block plus 
announcement - small number of predictable keys) is inserted *last*.

We can do it on the network and make #3 true fire-and-forget:
- Choose a group of nodes X for the completion.
- Insert through N tunnels, as many as needed for performance and reliability. 
Each one contains a separate onion routing to X. When each tunneled insert 
group finishes, it uses the onion to send a message, including its own 
identity, and a secret provided by the original inserter.
- When all (or enough) of the inserts have reached X, they combine, using a 
shared secret scheme as well as encryption to X.
- X (whichever one of the group is handling the completion) does something:
-- Insert the top block (or a small set of blocks).
-- Start a reveal protocol. If the lower blocks are predictable, it would be 
nice to have the option to do preinsert/reveal. But we still want 
fire-and-forget.
- ORIGINAL INSERTER:
-- Subscribe to the key. Just like everyone else who is waiting for it.
-- Fetch the data. Just like everyone else waiting for it. (Timing issues 
possibly, depending on what the key is, e.g. is manual intervention needed?). 
Maybe through a tunnel, again depending on popularity and protocol.
-- If the data is incomplete, identify which sub-onion-insert is responsible. 
Maybe we can avoid them in future. Then:
--- Partial reinsert, tunneled, possibly with blind/reveal, OR
--- Reinsert from scratch.
-- Second option is likely significantly safer. But if the fetch is tunneled 
the first option may be acceptable.
-- We could have the original inserter insert the top block, after testing, 
then the rendezvous is just to notify them that they can. But it's probably 
safer to avoid this. Tradeoff?
ADVANTAGES:
- True fire and forget.
- Tunnels aren't perfect, and we're using many of them. So take advantage of 
our existing security; nobody can identify tunnels until it's finished.
- If blocks are unpredictable below the top, insert the top block only after 
the rest are in.
- If blocks are predictable, we can still do preinsert/reveal, in spite of 
having tunneled everything.

EXPENSIVE POSSIBILITIES: (SPECULATIVE!)
- Healing. FEC should happen before decryption IMHO, for e.g. backups. So it 
could conceivably be done on the network.
- Test the lower level inserts by fetching a few random blocks (list provided 
by the originator, not known to the sub-inserters), before completing insert.
- Interesting, but probably not realistic, at least not without some means of 
payment. On darknet, that might be possible with some simple Ripple-like hop by 
hop accounting. Using a true currency such as bitcoin is IMHO too 
politically/legally risky, as well as impractical. Also IMHO making such things 
cost-based is risky security-wise; who's gonna run stuff at a loss just to see 
the traffic? The CIA of course! Never pay for anonymising services unless they 
are provably secure, e.g. coin blinding.
- Plus, how do we know we can we trust the group rendezvous point? Less of a 
problem if it's only doing a single insert.

Something closely related:

Dead man's switch protocol:
- Generate dead man certificates.
- Onion route them to various nodes on the network.
- The nodes receiving them will, after a specified delay, probe a specific set 
of keys (different each time), and if no outcome, will either insert a key or 
start an onion route.
- The latter triggers a combination like #3 feedback above: If enough keys 
arrive, and if whatever conditions are correct on the recipient (e.g. time), it 
does something.

General programmable solution? Maybe. We don't want anything that is 
over-reliant on a single node. So no ssh, no SVN, etc. But a lot of stuff can 
be built from small, strictly limited programs running on a handful of nodes. 
What distinguishes Freenet from Tor and I2P:
- Freenet is a storage network.
- Most of its traffic is non-realtime. (Not strictly true but that's the 
intention)
- Freenet does not rely on any single node.
- Freenet does not try to, or support, relying on any single node.

Preinsert tweaks:
- Do we want to be able to probe and partially renew a preinsert? Possibly 
without fetching it i.e. only the originator can do so.
- Or should a preinsert be usable *only* by revealing it?
- Do we want to be able to explicitly delete a preinsert? Since it's not a real 
insert anyway? Of course this is unreliable ...
- And/or give it an expiration date? This at least is probably a good idea, if 
we can implement it efficiently.



CONCLUSIONS?

We probably want all four options. And that's a lot of work! If that's true, 
the question is what is most urgent. I guess the answer to that is #1: 
rendezvous tunnels???

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Distributed censorship attack was Re: Really secure inserts?

Reply via email to