[freenet-dev] Performance with almost disconnected darknets

vive Tue, 23 Dec 2008 04:14:44 +0100

Hi list,

I have made some simulations to study what happens when two social networks are 
connected only by some small number of links (from now on the "cut"), while 
freenet routing is attempted between users running nodes in those networks. 
This scenario may appear with a powerful attacker who can e.g. implement a 
filtering mechanism between two social networks. The question is how hard it 
will strike against the availability of data when nodes get increasingly 
separated.

It is obvious that anyone who can split up the network based on the keyspace
will be able to separate nodes from data to some degree (in reality this also
includes the caching mechanism). But this simulation examines something else:
the impact of two underlying social networks (with users wanting to run full
darknet) potentially connected by only a small number of links. The concern
(discussed previously e.g. over irc) has been that different parts of the
network may still become highly specialized in locations as a consequence of
the constant swapping process that is going on. If that would happen, nodes in
one network would sooner or later forget data that becomes harder to reach in
the other network.

Short description of the result (and simulation details after that):
simulations indicate that the specialization of the positions does not happen
that much, it happens quite little when the cut is small so swap requests
seldomly move between the networks. When the cut between the networks is
larger, it increasingly happens. But with a larger cut it is also easier to
navigate between the networks and reach the data ... The main problem is that
the availability of data when the cut is really small gets lowered quite a bit.
For more, see the simulation and plots below.

There are at least two ways to measure the impact of this problem. First, how
much the locations specialize in different networks and whether the locations
in the different networks "diverge" just a little or a lot. Second, the
availability of data under some kind of request/insert model (that we should
have confidence in at least resembling something about reality), as described
below.

1. Measure of location specialization: I have considered the distribution of
the positions in two separate networks. What one wants in any "healthy" network
is a uniform distribution of positions, and one should use a measure of the
distance between an actual distribution and the uniform distribution. For this,
the variation distance was computed. The measure is on the difference between
how two distributions put their probability "mass" onto the same underlying
space (here: the positions). In theory the variation distance can be anything
between 0 and 1 (if the distributions completely diverge), here a little bit
less than 1 (there is no way here to "escape" the reference of a uniform
distribution). We want this measure to be as close to 0 as possible (that is,
the positions to be uniformly spread over the keyspace). It thus measures when
many positions have "escaped" from one of the networks, and gets higher quicker
when these positions disappear from specific parts of the keyspace.

2. Data availability: the success rate and the number of steps in successful
routes (here: the roundtrip steps) that is seen in simulation.

The simulation: this was made by joining both ideal (generated) social networks
as well as networks from actual datasets on social networks (the pgp WoT). Both
are navigable networks, although slighly differing. It is hoped that this can
correspond roughly to the scenario above, by joining these networks with
different numbers of random connections.

Simulation:
0. Load the two networks, each of size 4000. All nodes are set to be pure
darknet.
1. Connect the networks by joining pairs of random nodes, where one node is
picked from each network. I tried this with a cut of 10,100,1000,10000 links.
2. Nodes randomize their positions and swap for a while to stabilize the
locations (successful swaps get more uncommon after this) to start from a
"stable" state
3. 1000 keys are inserted by random nodes into the network (both sink and
cache). Nodes have a store of 50 keys and a cache of 50 keys. Both are LRU as
normal.
4. The main phase: nodes swap positions when they can, and request data for a
longer interval. (At the moment no new inserts nor churn, this can be the next
thing to simulate, but need data input, see below.) The goal is to see what the
data availability is, and how the positions have got affected after some time.
Nodes request for data via keys that have different popularity: a standard Zipf
distribution is used over the keys. The parameter is also varied in the
simulations. I have typially let the 4000 nodes do in total 100000 requests and
evaluated how that went. It could clearly be evaluated over longer time, but
how long is reasonable?
5. Evaluate the two measures above (variation distance always starts close to 0
when one initially randomizes)

Any thoughts on the above model? The constants are of course somewhat
arbitrarily picked. The goal is to see trends as simple as possible.

Some plots I attach (in each plot simulation on both the actual WoT dataset and
the ideal Kleinberg model):
1. How the size of the cut affects the availability of data (success rate)
2. As 1, but the average roundtrip steps in successful data requests
3. The variation distance depending on the size of the cut
4. Preferential search: how the popularity of the keys affects the above

Plots 1-3 run with fixed requests from a Zipf distribution with parameter 1,
and plot 4 shows that varying this parameter may have some impact in at least
some cases.

There is clearly more to examine, but I think I need some data to test for
reasonable things and avoid fumbling in the dark with parameters (as it is very
easy to do): like churn (the simulation can do both temporary and permanent
leaves) with joins/leaves and randomization. How these rates look like in
practice? Is there any data that estimates the number of keys in the network?
It would be good to tune the store/cache sizes to an estimated number of keys.

-- V.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cut_sr.png
Type: image/png
Size: 66345 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20081223/16aa179a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cut_avg.png
Type: image/png
Size: 50308 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20081223/16aa179a/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cut_vd.png
Type: image/png
Size: 55669 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20081223/16aa179a/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zipf_sr.png
Type: image/png
Size: 52563 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20081223/16aa179a/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20081223/16aa179a/attachment.pgp>

[freenet-dev] Performance with almost disconnected darknets

Reply via email to