The fundamental routing model for Freenet (opennet and darknet) is based on an assumption of proper distribution of link lengths (the distance between the two endpoint nodes). In order to achieve good routing efficiency, the cumulative distribution of link lengths should follow a logarithmic curve; that is, for any node, it should have approximately equal numbers of links of length 0.25-0.50 (0.5 being the longest possible link), 0.125-0.25, 0.0625-0.125, etc. For details, see the papers by Kleinberg: http://www.cs.cornell.edu/home/kleinber/nat00.pdf http://www.cs.cornell.edu/home/kleinber/swn.pdf Roughly speaking, this distribution means that requests start by taking long hops, and then narrow in on the target location via smaller and smaller hops, permitting O(log(n)^2) route lengths. Opennet and darknet both attempt to maintain this distribution; opennet through connection churn, darknet through location swaps.
TheSeeker raised an interesting point in IRC yesterday: link lengths on opennet nodes don't actually appear to follow this distribution. I've made a very cursory start on the analysis, but as I'm busy with other parts of Freenet I'm going to hope someone else can continue it. I took the FOAF location data from my opennet node (normally I have a couple darknet peers, but at the time I took the data they were offline, so this is purely opennet data). From that, I can see the link lengths from my node to its peers, and from those peers to their peers. I then plotted the distribution of link lengths, both on a linear scale and a log scale: freenet:CHK at CQnUwrJpEOOl7CGHE7jHE47WClMS1unpcgNRC~yBCIw,Tzdx8plj--XJmReCOAxmV3TdCVz66pKe~snXgSpK~aI,AAIC--8/locations.png freenet:CHK at fgKX0f57ivs7qvLg1QaqxMudLdMrHXVephO-gsMgEPY,VRcVHdnLnq9UDi-WMyk8IzW12wq-YTj2M5z2H2fawmE,AAIC--8/log_locations.png Spreadsheet those came from: freenet:CHK at MXlEbI5oppUIOU3n3MJ7xjvH9I1DXGcPSBqjHebtWaQ,j9pG8L-hWoUVYRbKtwQMeM~8siE1kfGlrBucI9xLe~U,AAIC--8/locations.gnumeric The latter graph is probably more informative: ideally, those points should approximate a straight line. Clearly, they don't, though I haven't run any statistical measures. (The curious might want to start here: http://en.wikipedia.org/wiki/Kolmogorov?Smirnov_test .) There are several confounding factors. First, the data aren't independent; there should be local clustering, and I seem to have double-counted the links to my node (18 out of 353 data points) (links between my peers would also be double-counted, but there don't appear to be any). Second, there's the bandwidth issue: some of my peers are faster than others, as can be seen from their varied number of FOAF locations. Peers with more FOAF locations will receive more traffic, in (rough) proportion to the number of FOAF locations they have. I'm uncertain how the link length distribution should respond; perhaps a link to a peer should be counted n times, where n is the number of FOAF locations it advertises? Or perhaps not; figuring that out would take some theoretical work I haven't done. On average, across a large number of nodes / links, that effect should go away. Third, we must be wary of observer bias: nodes that connect to other nodes are more likely to be observed by a random sample of nodes. This will impact FOAF link length counting, but not local link length counting. I'm not sure what to make of this data: it may represent a deep-rooted problem with routing, or it might go away if I properly account for the above effects and gather more data. I can think of several possible sources of problems. For example, the LRU link retention policy will tend to favor badly distributed links to nodes that have higher bandwidth (and therefore more FOAF locations) simply because we route more requests to them. Similarly, it will tend to favor nodes with large caches that are serving recent / popular data effectively over those that have smaller caches but route unpopular requests better. One simple problem is that strict LRU data is noisy; that combined with nodes that aren't always online might cause too much random connection churn and too little structured connection churn. We could smooth the data in some fashion without impacting the theoretical basis behind the LRU criteria, I believe (though I'd have to look at the relevant paper in more detail). We could also consider attempting to correct for FOAF location count somehow. I think this merits further investigation, involving both some theory and more data. If there is in fact a problem, we should give consideration to modifying the link selection policy. It would also be useful to have some data about hybrid darknet/opennet nodes: are those fixed links causing link length distribution problems? (Pure darknet nodes would be an entirely different case, I believe.) Evan Daniel
