Re: [freenet-dev] Performance with almost disconnected darknets
On Tue, Dec 23, 2008 at 12:48:04PM +, Matthew Toseland wrote: On Tuesday 23 December 2008 03:14, vive wrote: Simulation: 0. Load the two networks, each of size 4000. All nodes are set to be pure darknet. 1. Connect the networks by joining pairs of random nodes, where one node is picked from each network. I tried this with a cut of 10,100,1000,1 links. 2. Nodes randomize their positions and swap for a while to stabilize the locations (successful swaps get more uncommon after this) to start from a stable state 3. 1000 keys are inserted by random nodes into the network (both sink and cache). Nodes have a store of 50 keys and a cache of 50 keys. Both are LRU as normal. Actually we use pseudo-random replacement now. This probably doesn't strongly affect the results. Is this documented? I was unable to find it. It would be great to have a place on the wiki where the heuristics related to 1. Routing 2. Caching/Storing are documented. Maybe at least a short description of each. This way, simulations can be made more realistic and design decisions made more clear. Right now the simulations do routing with backtracking as well as storing/caching according to the sinking strategy (store) and LRU (both store and cache). One could of course read the code ... but it may also be hard to get the high-level description right-away from many details of lower-level code. :-) Is it possible that someone who has insight into the details of the heuristics could write a short description? Cheers, /vive pgp2X7Wq9CdT4.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Performance with almost disconnected darknets
On Monday 05 January 2009 14:17, vive wrote: On Tue, Dec 23, 2008 at 12:48:04PM +, Matthew Toseland wrote: 2. Data availability: the success rate and the number of steps in successful routes (here: the roundtrip steps) that is seen in simulation. The simulation: this was made by joining both ideal (generated) social networks as well as networks from actual datasets on social networks (the pgp WoT). Both are navigable networks, although slighly differing. It is hoped that this can correspond roughly to the scenario above, by joining these networks with different numbers of random connections. Simulation: 0. Load the two networks, each of size 4000. All nodes are set to be pure darknet. 1. Connect the networks by joining pairs of random nodes, where one node is picked from each network. I tried this with a cut of 10,100,1000,1 links. 2. Nodes randomize their positions and swap for a while to stabilize the locations (successful swaps get more uncommon after this) to start from a stable state 3. 1000 keys are inserted by random nodes into the network (both sink and cache). Nodes have a store of 50 keys and a cache of 50 keys. Both are LRU as normal. Actually we use pseudo-random replacement now. This probably doesn't strongly affect the results. 4. The main phase: nodes swap positions when they can, and request data for a longer interval. (At the moment no new inserts nor churn, this can be the next thing to simulate, but need data input, see below.) The goal is to see what the data availability is, and how the positions have got affected after some time. Nodes request for data via keys that have different popularity: a standard Zipf distribution is used over the keys. The parameter is also varied in the simulations. I have typially let the 4000 nodes do in total 10 requests and evaluated how that went. It could clearly be evaluated over longer time, but how long is reasonable? 5. Evaluate the two measures above (variation distance always starts close to 0 when one initially randomizes) The real world case most interesting to me is that of two darknets slowly merging. Does this result in a severe drop in performance in the short run? Hence, an important measurement is the success rate for requests made on the same side of the border as the data was inserted on. Is this dramatically lower with a smaller cut? How does it compare to that darknet running on its own? Hi! I did some simulations of this. See the attached plots. The model here is as described previously, but it tracks data and keys with respect to which part of the network they were originally inserted. Routes where nodes query for keys that were originally inserted in the same network part are labelled short routes, otherwise long routes when they search for keys inserted in the other part. This is great, it demonstrates with a reasonable degree of confidence that two darknets approaching each other and slowly merging will not have their local routing trashed, so we do not need to have any special handling for separate subnetworks; and, provided we assume the current model of just doing lots of requests, the most popular data from the other side will generally be available eventually and this improves dramatically as more links are established. The impact of this varies depending on the size of the cut and how popular certain keys are. The plots illustrate both these dependencies. The simulations show at least two things regarding to the previous fear. First, connecting two darknets loosely does not seem to break routing that much within a darknet, but shields the nodes off from data inserted in the other network. (And data is at least slowly made data available again. This is not very surprising, but here are the simulations showing some impact.) Second, real-world networks seem to be slightly better (than fair models where all nodes get the same number of connections) at making data available via long routes. This is seen by comparing the two plots where the zipf parameter is varied. The last effect is probably due to of a different degree distribution than the ideal model (then, a few high-degree nodes naturally see more queries passing through, quickly making remote data available for more nodes in their own network). It seems to be a lot more than slightly if I'm reading the plots right. Small world with fixed degree seems to be a lot slower than the somewhat-scale-free small world networks that real-world data tends to produce. But it's a tradeoff between performance and vulnerability ... ignoring the question of the capacity of well-connected nodes... pgpY6Nhrmohav.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org
Re: [freenet-dev] Performance with almost disconnected darknets
On Monday 05 January 2009 14:27, vive wrote: On Tue, Dec 23, 2008 at 12:48:04PM +, Matthew Toseland wrote: On Tuesday 23 December 2008 03:14, vive wrote: Simulation: 0. Load the two networks, each of size 4000. All nodes are set to be pure darknet. 1. Connect the networks by joining pairs of random nodes, where one node is picked from each network. I tried this with a cut of 10,100,1000,1 links. 2. Nodes randomize their positions and swap for a while to stabilize the locations (successful swaps get more uncommon after this) to start from a stable state 3. 1000 keys are inserted by random nodes into the network (both sink and cache). Nodes have a store of 50 keys and a cache of 50 keys. Both are LRU as normal. Actually we use pseudo-random replacement now. This probably doesn't strongly affect the results. Is this documented? I was unable to find it. It would be great to have a place on the wiki where the heuristics related to 1. Routing 2. Caching/Storing are documented. Maybe at least a short description of each. This way, simulations can be made more realistic and design decisions made more clear. Right now the simulations do routing with backtracking as well as storing/caching according to the sinking strategy (store) and LRU (both store and cache). There are quite a few places on the wiki where parts of the design are documented ... these are usually way out of date. One could of course read the code ... but it may also be hard to get the high-level description right-away from many details of lower-level code. :-) Is it possible that someone who has insight into the details of the heuristics could write a short description? The sinking strategy is simply that we cache in the store only when an insert is on a node which is closer to the target than any of its peers. Do you want more details on the routing heuristics? Or to start a more general debate on how much detailed internal documentation we should make, where we should put it etc? Cheers, /vive pgpw1f7MQEAUB.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Performance with almost disconnected darknets
On Mon, Jan 05, 2009 at 07:19:35PM +, Matthew Toseland wrote: On Monday 05 January 2009 14:27, vive wrote: There are quite a few places on the wiki where parts of the design are documented ... these are usually way out of date. One could of course read the code ... but it may also be hard to get the high-level description right-away from many details of lower-level code. :-) Is it possible that someone who has insight into the details of the heuristics could write a short description? The sinking strategy is simply that we cache in the store only when an insert is on a node which is closer to the target than any of its peers. Do you want more details on the routing heuristics? Or to start a more general debate on how much detailed internal documentation we should make, where we should put it etc? I have understood how sinking works, but other major heuristics with routing and storage would be really great to see documented! I am not the guy to analyze what level of (overall) detail is best for the whole project. But these things have to go into the simulation sooner or later to make it more realistic. /v. pgpUbXC3txwVB.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] Performance with almost disconnected darknets
On Tuesday 23 December 2008 03:14, vive wrote: Hi list, I have made some simulations to study what happens when two social networks are connected only by some small number of links (from now on the cut), while freenet routing is attempted between users running nodes in those networks. This scenario may appear with a powerful attacker who can e.g. implement a filtering mechanism between two social networks. Or it may occur naturally as two growing darknets meet each other, in which case hopefully it will be a transitional phase. The question is how hard it will strike against the availability of data when nodes get increasingly separated. It is obvious that anyone who can split up the network based on the keyspace will be able to separate nodes from data to some degree (in reality this also includes the caching mechanism). But this simulation examines something else: the impact of two underlying social networks (with users wanting to run full darknet) potentially connected by only a small number of links. The concern (discussed previously e.g. over irc) has been that different parts of the network may still become highly specialized in locations as a consequence of the constant swapping process that is going on. If that would happen, nodes in one network would sooner or later forget data that becomes harder to reach in the other network. Short description of the result (and simulation details after that): simulations indicate that the specialization of the positions does not happen that much, it happens quite little when the cut is small so swap requests seldomly move between the networks. When the cut between the networks is larger, it increasingly happens. But with a larger cut it is also easier to navigate between the networks and reach the data ... The main problem is that the availability of data when the cut is really small gets lowered quite a bit. For more, see the simulation and plots below. There are at least two ways to measure the impact of this problem. First, how much the locations specialize in different networks and whether the locations in the different networks diverge just a little or a lot. Second, the availability of data under some kind of request/insert model (that we should have confidence in at least resembling something about reality), as described below. 1. Measure of location specialization: I have considered the distribution of the positions in two separate networks. What one wants in any healthy network is a uniform distribution of positions, and one should use a measure of the distance between an actual distribution and the uniform distribution. For this, the variation distance was computed. The measure is on the difference between how two distributions put their probability mass onto the same underlying space (here: the positions). In theory the variation distance can be anything between 0 and 1 (if the distributions completely diverge), here a little bit less than 1 (there is no way here to escape the reference of a uniform distribution). We want this measure to be as close to 0 as possible (that is, the positions to be uniformly spread over the keyspace). It thus measures when many positions have escaped from one of the networks, and gets higher quicker when these positions disappear from specific parts of the keyspace. 2. Data availability: the success rate and the number of steps in successful routes (here: the roundtrip steps) that is seen in simulation. The simulation: this was made by joining both ideal (generated) social networks as well as networks from actual datasets on social networks (the pgp WoT). Both are navigable networks, although slighly differing. It is hoped that this can correspond roughly to the scenario above, by joining these networks with different numbers of random connections. Simulation: 0. Load the two networks, each of size 4000. All nodes are set to be pure darknet. 1. Connect the networks by joining pairs of random nodes, where one node is picked from each network. I tried this with a cut of 10,100,1000,1 links. 2. Nodes randomize their positions and swap for a while to stabilize the locations (successful swaps get more uncommon after this) to start from a stable state 3. 1000 keys are inserted by random nodes into the network (both sink and cache). Nodes have a store of 50 keys and a cache of 50 keys. Both are LRU as normal. Actually we use pseudo-random replacement now. This probably doesn't strongly affect the results. 4. The main phase: nodes swap positions when they can, and request data for a longer interval. (At the moment no new inserts nor churn, this can be the next thing to simulate, but need data input, see below.) The goal is to see what the data availability is, and how the positions have got affected after some time. Nodes request for data via keys that have different popularity: a standard Zipf distribution is used over the keys.