On Wednesday 27 Feb 2013 18:54:34 Matthew Toseland wrote:
> operhiem1's graphs of probed total datastore size have been attacked recently 
> by nodes returning bogus store sizes (in the multi-petabyte range). This 
> caused a sudden jump in store sizes on the total store size graph. He 
> excluded outliers, and the spike went away, but now it's come back.
> 
> The simplest explanation is that the person whose nodes are returning the 
> bogus stats has hacked their node to return bogus datastore stats even when 
> it is relaying a probe request. Given we use fairly high HTLs (30?) for 
> probes, this can affect enough traffic to have a big impact on stats.
> 
> Total store size stats don't matter that much, but we need to use probe stats 
> for a couple of things that do:
> 1. Pitch Black prevention will require probing for the typical distance 
> between a node and its peers. Granted on darknet it's harder for an attacker 
> to have a significant number of edges / nodes distributed across the keyspace.
> 2. I would like to be able to test empirically whether a given change works. 
> Overall performance fluctuates too wildly based on too many factors, so 
> probing random nodes for a single statistic (e.g. the proportion of requests 
> rejected) seems the best way to sanity check a network-level change. If the 
> stats can be perverted this easily then we can't rely on them, so empiricism 
> doesn't work.
> 
> So how can we deal with this problem?
> 
> We can safely get stats from a randomly chosen target location, by routing 
> several parts of a probe request randomly and then towards that location. The 
> main problems with this are:
> - It gives too much control. Probes are supposed to be random.
> - A random location may not be a random node, e.g. for Pitch Black 
> countermeasures when we are being attacked.
> 
> For empiricism I guess we probably want to just have a relatively small 
> number of trusted nodes which insert their stats regularly - "canary" nodes?
> 
Preliminary conclusions, talking to digger3:

There are 3 use cases.

1) Empirical confirmation when we do a build that changes something. Measure 
something to see if it worked. *NOT* overall performance, low level stuff that 
should show a big change.
=> We can use "canary" nodes for this, run by people we trust. Some will need 
to run artificial configs, and they're probably not representative of the 
network as a whole.
=> TODO: We should try to organise this explicitly, preferably before trying 
the planned AIMD changes...
2) Pitch Black location distance detection.
=> Probably OK, because it's hard to get a lot of nodes in random places on the 
keyspace on darknet.
3) General stats: Datastore, bandwidth, link length distributions, etc. This 
stuff can and should affect development.
=> This is much harder. *Maybe* fetch from a random location, but even there 
it's problematic?
=> We can however improve this significantly by discarding a larger number of 
outliers.
Given that probes have HTL 30, and assuming opennet so nodes are randomly 
distributed:
10 nodes could corrupt 5% of probes
21 nodes could corrupt 10% of probes
44 nodes could corrupt 20% of probes.

Also note that it depends on what the stat is - the probe request stats are a 
percentage from 0 to 100, so much less vulnerable than datastore size, which 
can be *big*.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to