On Tuesday 23 December 2008 03:14, vive wrote:
> Hi list,
> 
> I have made some simulations to study what happens when two social networks 
are connected only by some small number of links (from now on the "cut"), 
while freenet routing is attempted between users running nodes in those 
networks. This scenario may appear with a powerful attacker who can e.g. 
implement a filtering mechanism between two social networks. 

Or it may occur naturally as two growing darknets meet each other, in which 
case hopefully it will be a transitional phase.

> The question is how hard it will strike against the availability of data 
when nodes get increasingly separated. 
> 
> It is obvious that anyone who can split up the network based on the keyspace 
will be able to separate nodes from data to some degree (in reality this also 
includes the caching mechanism). But this simulation examines something else: 
the impact of two underlying social networks (with users wanting to run full 
darknet) potentially connected by only a small number of links. The concern 
(discussed previously e.g. over irc) has been that different parts of the 
network may still become highly specialized in locations as a consequence of 
the constant swapping process that is going on. If that would happen, nodes 
in one network would sooner or later forget data that becomes harder to reach 
in the other network. 
> 
> Short description of the result (and simulation details after that): 
simulations indicate that the specialization of the positions does not happen 
that much, it happens quite little when the cut is small so swap requests 
seldomly move between the networks. When the cut between the networks is 
larger, it increasingly happens. But with a larger cut it is also easier to 
navigate between the networks and reach the data ... The main problem is that 
the availability of data when the cut is really small gets lowered quite a 
bit. For more, see the simulation and plots below.
> 
> There are at least two ways to measure the impact of this problem. First, 
how much the locations specialize in different networks and whether the 
locations in the different networks "diverge" just a little or a lot. Second, 
the availability of data under some kind of request/insert model (that we 
should have confidence in at least resembling something about reality), as 
described below.
> 
> 1. Measure of location specialization: I have considered the distribution of 
the positions in two separate networks. What one wants in any "healthy" 
network is a uniform distribution of positions, and one should use a measure 
of the distance between an actual distribution and the uniform distribution. 
For this, the variation distance was computed. The measure is on the 
difference between how two distributions put their probability "mass" onto 
the same underlying space (here: the positions). In theory the variation 
distance can be anything between 0 and 1 (if the distributions completely 
diverge), here a little bit less than 1 (there is no way here to "escape" the 
reference of a uniform distribution). We want this measure to be as close to 
0 as possible (that is, the positions to be uniformly spread over the 
keyspace). It thus measures when many positions have "escaped" from one of 
the networks, and gets higher quicker when these positions disappear from 
specific parts of the keyspace.
> 
> 2. Data availability: the success rate and the number of steps in successful 
routes (here: the roundtrip steps) that is seen in simulation.
> 
> The simulation: this was made by joining both ideal (generated) social 
networks as well as networks from actual datasets on social networks (the pgp 
WoT). Both are navigable networks, although slighly differing. It is hoped 
that this can correspond roughly to the scenario above, by joining these 
networks with different numbers of random connections.
> 
> Simulation:
> 0. Load the two networks, each of size 4000. All nodes are set to be pure 
darknet.
> 1. Connect the networks by joining pairs of random nodes, where one node is 
picked from each network. I tried this with a cut of 10,100,1000,10000 links.
> 2. Nodes randomize their positions and swap for a while to stabilize the 
locations (successful swaps get more uncommon after this) to start from 
a "stable" state
> 3. 1000 keys are inserted by random nodes into the network (both sink and 
cache). Nodes have a store of 50 keys and a cache of 50 keys. Both are LRU as 
normal.

Actually we use pseudo-random replacement now. This probably doesn't strongly 
affect the results.

> 4. The main phase: nodes swap positions when they can, and request data for 
a longer interval. (At the moment no new inserts nor churn, this can be the 
next thing to simulate, but need data input, see below.) The goal is to see 
what the data availability is, and how the positions have got affected after 
some time. Nodes request for data via keys that have different popularity: a 
standard Zipf distribution is used over the keys. The parameter is also 
varied in the simulations. I have typially let the 4000 nodes do in total 
100000 requests and evaluated how that went. It could clearly be evaluated 
over longer time, but how long is reasonable?
> 5. Evaluate the two measures above (variation distance always starts close 
to 0 when one initially randomizes)
> 
> Any thoughts on the above model? The constants are of course somewhat 
arbitrarily picked. The goal is to see trends as simple as possible.
> 
> Some plots I attach (in each plot simulation on both the actual WoT dataset 
and the ideal Kleinberg model):
> 1. How the size of the cut affects the availability of data (success rate)
> 2. As 1, but the average roundtrip steps in successful data requests
> 3. The variation distance depending on the size of the cut
> 4. Preferential search: how the popularity of the keys affects the above
> 
> Plots 1-3 run with fixed requests from a Zipf distribution with parameter 1, 
and plot 4 shows that varying this parameter may have some impact in at least 
some cases.

The real world case most interesting to me is that of two darknets slowly 
merging. Does this result in a severe drop in performance in the short run? 
Hence, an important measurement is the success rate for requests made on the 
same side of the border as the data was inserted on. Is this dramatically 
lower with a smaller cut? How does it compare to that darknet running on its 
own?
> 
> There is clearly more to examine, but I think I need some data to test for 
reasonable things and avoid fumbling in the dark with parameters (as it is 
very easy to do): like churn (the simulation can do both temporary and 
permanent leaves) with joins/leaves and randomization. How these rates look 
like in practice? Is there any data that estimates the number of keys in the 
network? It would be good to tune the store/cache sizes to an estimated 
number of keys.

I have no idea. We're talking "real" darknets here, which are rather 
hypothetical at the moment - if anyone is running any they're not telling 
us. :( Darknet links should have relatively low churn, although how 
frequently people join and leave the network is uncertain. As regards 
storage ... we use a fraction of the available disk space by default ... 5% 
above 20G, limit of 256GB; 10% above 10G; 512MB above 5G; 256MB below it. 
Compare to bandwidth usage, which is seldom over 50KB/sec, and is maybe 70% 
actual data transfer. Number of keys in the network is difficult - we do have 
reachability problems though, so clearly more content is being inserted 
continually.

Anything else?
> 
> -- V.

Attachment: pgpjwGcKTyupC.pgp
Description: PGP signature

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to