Author: hobx
Date: 2006-03-08 21:16:40 +0000 (Wed, 08 Mar 2006)
New Revision: 8196

Added:
   trunk/docs/fnet.tex
Log:
document


Added: trunk/docs/fnet.tex
===================================================================
--- trunk/docs/fnet.tex 2006-03-08 21:15:06 UTC (rev 8195)
+++ trunk/docs/fnet.tex 2006-03-08 21:16:40 UTC (rev 8196)
@@ -0,0 +1,335 @@
+\documentclass[11pt,a4paper]{article}
+%\documentclass[twoside,leqno,twocolumn]{article}  
+
+\usepackage[latin1]{inputenc}
+\usepackage{amstext}
+\usepackage{graphicx}
+\usepackage{amsfonts}
+\usepackage{latexsym}
+%\usepackage{numberpar}
+\usepackage{amssymb}
+%\usepackage{amsthm}
+\usepackage{amsmath}
+%\usepackage[swedish]{babel}
+
+%%  \addtolength{\hoffset}{-1cm}
+%%  \addtolength{\textwidth}{2cm}
+%%  \addtolength{\voffset}{-1cm}
+%%  \addtolength{\textheight}{2cm}
+
+\title{Private Communication Through a Network of Trusted Connections: The 
Dark Freenet} 
+\author{Ian Clarke \thanks{FreenetProject Inc. ian at freenetproject.org}, 
Oskar
+  Sandberg \thanks{The Department of Mathematical Sciences, Chalmers
+    Technical University and Gothenburg University. ossa at math.chalmers.se}, 
Matthew Toseland \thanks{FreenetProject Inc. matthew at freenetproject.org}}
+
+
+%\input{../tex/defs.tex}
+\newcommand{\PR}{\mathbf{P}}
+\newcommand{\dist}{d}
+\newcommand{\pos}{\ensuremath{\phi}}
+
+\setlength\parindent{0pt}
+\setlength\parskip{\medskipamount}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+\input{ab.txt}
+ \end{abstract}
+
+
+\section{Introduction}
+
+As the international growth of the Internet throughout the world, it
+has the potential to increase the availability of information for
+everybody, and to strengthen free speech and discourse. However, a
+blind belief that the Internet, in and of itself, cannot be controlled
+and will necessarily open the doors to free speech everywhere is
+unwise. In fact, as has been seen in many cases, it is indeed possible
+to control and censor much of the Internet, and users of online
+services are considerably less anonymous and more exposed than most
+believe.
+
+The Freenet Project is an ongoing effort to develop a distributed
+peer-to-peer network for publication and distribution of data. The
+system, Freenet \cite{clarke:protecting}\cite{clarke:freenet}, is
+intended to strengthen the freedom of speech by allowing anyone to
+publish anything in the network at any time.  To this end, the system
+integrates many levels of redundancy and dispersion of data throughout
+the Internets topology, which makes it difficult for an adversary to
+eliminate data from the network. The network strives to opaque, so
+that it is difficult without global knowledge to see which nodes are
+responsible are responsible for storing which data at which time, and
+flexible, so that it can survive both the everyday churn of an
+uncontrolled peer-to-peer environment and malicious attacks against
+the network itself.
+
+An important element of allowing free communication is that the
+privacy of both those who publish and those who access information
+should be protected, both from outsiders and from other people in the
+network. Earlier generations of Freenet anomymized only via a Crowds
+[] type system, which, while offering some disassociation between
+queries and those who initiate them, is vulnerable to attack in many
+ways. One of the outstanding questions of Freenet development has been
+how the network can better anonymize queries in the future, with the
+option of utilizing stronger mixing techniques [] considered possible,
+but difficult in practice.
+
+A second concern, that has come to the forefront with the actual
+deployment of Freenet, is the vulnerability of people operating nodes
+in the network. While the network has striven to dissociate users with
+the data they access and nodes with the data they serve, it has not
+hidden the presence of nodes from the world. In order to find Freenet
+nodes in earlier incarnations of the system, it has been enough to
+join the network and start operating as a node oneself: part of the
+continuous process of routing and optimizing the network, one would
+continuously learn the identities and Internet addresses of more nodes
+in the network. This means that somebody wishing to attack the Freenet
+network in its entirety would have no problem finding and identifying
+participants.
+
+In order to resolve these problems, we have radically shifted the
+direction of Freenet development. Instead of direct connections
+between nodes forming dynamically in the course of the networks
+operations, we now allow nodes to limit their connections to other
+nodes whom they trust. This model of trusted connections radically
+improves the security of the network, under the assumption that
+trusted neighbors really are trustworthy, but complicates other
+issues. Notably, since the we cannot design the overlay graph of
+connected nodes ourselves, routing from one point to another becomes
+more difficult.  Also, there is a cost to usability, especially with
+regard to barrier of entry into the network (with trusted connections,
+a new node cannot enter the network until it knows other other nodes
+that are present).
+
+We believe, however, that the benefits of the trusted connections
+model outweigh the complications. We hold that in many cases trusted
+connections are a better model for protecting the identities of those
+publishing and accessing data then traditional mixing approaches. They
+also dramatically increase the difficulty of identifying and attacking
+participants in the network, which nodes when only talk to trusted
+peers must be done by subjugating one node at a time, and then
+proceeding to attack its neighbors\footnotemark.
+
+\footnotetext{Assuming of course that nodes cannot be identified in
+  another way, such as monitoring the network for specific traffic
+  patterns.}
+
+\section{Freenet's Design and Operation}
+
+Freenet's goal is to provide a distributed system for the publication
+and access of information. Documents are stored, or inserted, into the
+network with an associated address, or key. The key (seen as binary
+string) is used both to decide where in the network to store the
+document, and to authenticate the document when it is accessed. A
+person wishing to access the document sends out a retrieve query, or
+request, which performs a routed depth first search. When the request
+finds a node containing a copy of the document, the entire document is
+returned through the search path.
+
+Freenet nodes do not make guarantees about the persistence of
+document, but keep them using a best-effort algorithm that removes the
+least-recently accessed document when a node's storage overflows. In
+fact, nodes aggressively cache many if not all the documents that they
+transfer, so a system of removing documents from the cache is
+necessary. One way of looking at the system is like a giant grid of
+caching proxy hosts, each proxying and caching for one another, but
+with no ``authoritative source'' to return to should the document not be
+in the cache. In practice, we have found that the system is relatively
+good at maintaining data when a sufficient amount of storage - about five
+to ten times redundancy over what one hopes to maintain - is available
+(see for instance [theos chapter], [that other fn
+paper])\footnotemark. Likewise, Freenet does not guarantee that data
+in the network can be found by any particular search in a bounded
+number of steps (or indeed at all) - it only attempts to make it
+likely that it will.
+
+\footnotetext{This is not to say that there isn't value in the
+  permanent storage of information. It simply says that guaranteed
+  permanent storage is not the problem Freenet is trying to solve.}
+
+\subsection{Inserting and Requesting data}
+
+We assume that to every piece of data, $D$, there is an associated key
+$K$. Each node has a routing table, which with respect to the key $K$,
+gives an ordering of nodes neighbors after their desirability as the
+destination of the query. The route is then constructed by going from
+node to node, and selecting the most desirable neighbor that is
+available and not already in the route as the next step. When this is
+not possible, the query backtracks and the route is restarted from the
+previous node.  Routes continue until they terminate due to achieving
+their purpose (in practice: finding the sought document) or terminate
+because of some heuristic (which could be as simple as the number of
+steps reaching a limit).
+
+It should be noted that, unlike in other DHT systems, a key has no
+canonical ``home'' or destination in the network. While there may be a
+node that is the most ``desirable'' destination for a particular $K$,
+it will most likely not know that. Because of this, the manner in
+which the routing tables are constructed need only be consistent, in
+that two routes for the same key should be likely to intersect, and
+well distributed, meaning that routes should not all tend toward some
+small subset of the nodes. See Figure \ref{fig:route}.
+
+\begin{figure}
+  \centering
+  
+  \caption{The routing algorithm does not need to route a query for a
+    particular key to a specific place. It only needs to route queries
+    for the same key, but started at different nodes, to the same
+    place.}
+  \label{fig:route}
+\end{figure}
+
+To search for a document in the network, matching a key $K$, one
+establishes a route for $K$. At each step of the routing, the node
+checks it's cache to see if a document associated with that key is
+stored. If such a document is found, the route terminates and the
+document is returned to the previous node in the route, which proxies
+it back towards the destination. Each node that proxies the data will
+cache it.
+
+The inverse process, that of originally inserting a document, is
+similar. A route is established, and if a document matching K is not
+found, the document $D$ is proxied up the route, with each node caching
+it. If an existing document matching $K$ is found while routing, it is
+returned down the path just like if $K$ had been requested.
+
+\subsection{Keys and Document Authentication}
+
+In order to maintain data integrity, keys are related to documents
+they denote by cryptographic means. In particular Freenet uses keys
+that either a cryptographic hash of the document itself, or a
+cryptographic hash of a name and a public asymmetric crypto key (PK)
+with which the document is digitally signed.
+
+While proxying and caching a document, each node can thus verify the
+integrity of the document. In first case, the node need only calculate
+the hash of the document, and make sure it matches the key it routed
+for. In the second case, the PK and name string are provided as
+metadata with the document. When the node receives the document, it
+checks that the name and PK hash to the key that it routed for, and
+then checks that PK verifies the signature on the data. 
+
+The first type of key, called a Content-Hash Key (CHK), is used for
+large static pieces of data. Since the key is generated from the data,
+it can neither be generated beforehand nor predicted by somebody who
+does not have the data. However, such keys are still useful, for
+example, when one document wishes to provide a link to a copy of an
+explicit, static, document.
+
+The second type of key, know a Signature Verifying Key (SVK), allows
+several documents to be published by the holder of a single asymmetric
+private key/public key pair. For anybody who knows the PK, the holder
+of the key-pair has a private namespace in which he can publish signed
+documents. This is used to implement updating schemes - either by
+including a version in the name, or having the name include a date of
+publication.
+
+In both cases, the key cannot be derived from a readable address. Thus
+finding documents in Freenet requires finding at least the publishers
+PK in some way. No method for direct keyword searching is provided,
+though some methods are possible\footnotemark.
+
+\footnotetext{One method that has been used is to create the public
+  key/private key pair for an SVK using a string as a seed to a known
+  PRNG. Then anybody who knows or guesses the string can derive the
+  public key necessary for querying for the data in the network. The
+  insecurity implied by this method is obvious however: they can also
+  derive the private key needed to insert other data under the same name.}
+
+\section{Routing in the Network}
+
+Under the trusted connection model, we have to assume that the graph of
+nodes and links between them is fixed, and cannot be optimized for
+routing purposes. This is a very different model from previous version
+of Freenet, and other DHT systems like [ref][ref][ref] which all
+construct a graph explicitly so as to make efficient routing
+possible. 
+
+In fact, the possible efficiency of routing depends on the structure
+of the network. If there is no short path between two nodes, it will
+not be possible for a document inserted at one to be reachable at
+another in a short number of steps, regardless of how well we route.
+Thus the possibility of a system the one proposed working depends
+crucially on network having certain properties.
+
+In order to hope that the system can function, we must thus rely on
+certain observations about the network of trusted connection that we
+may use. In particular, it is a subgraph of the of the world's social
+network, the dynamics which has been the subject of much study,
+initially by sociologists and experimental psychologists
+[ref][ref][ref], and increasingly by mathematical means [ref][ref].
+In particular, it is generally accepted that the worlds social network
+is a so called ``small world'', meaning that there are short paths
+between every two nodes, and that there is local clustering (people
+who share a common friend are more likely to know each other).
+
+In fact, studies of routing in the world's social network have already
+been done, starting in the 1960s with Stanley Migram's [ref] famous
+small-world experiments [ref]. These, and subsequent repetitions
+[ref][ref] seem to indicate that it is possible to find routes from
+any person to any other, using only a very small number of steps (the
+idea of ``six degrees of separation'' for the entire world is derived
+from Milgram's data.)
+
+We assume (perhaps somewhat optimistically) that the network of
+trusted connections that we can use is a small world. In order to
+route on it, we firstly must give each node an identity so that they
+can be routed to. The most direct way of assigning these identities is
+to choose them from the same space as the keys, and then try to route
+queries for a particular key to those nodes whose identities most
+closely match the key value. This technique is used by all current DHT
+systems, and will be used by us as well. In light of the observation
+described in Figure \ref{fig:route} the identities of each node can be
+selected quite randomly.
+
+Once each node has an identity, one could imagine calculating the all
+pairs shortest path for the graph, and keeping a routing table at each
+node showing where to reach queries intended to ultimately reach every
+node. This would involve a large amount of work and bandwidth when
+trying to calculate the paths, as well as network sized routing tables
+at each node, neither or which is desirable. More crucially, it would
+require one to recalculate the entire routing able every time a node
+joins or leaves the network, which is an impossibility under the
+intended algorithm.
+
+Instead, we use a method which draws off the small world models of Jon
+Kleinberg [ref]. The routing we perform is purely greedy: at each
+step, the ``desirability'' of the neighbors is ordered by the
+proximity of their identities to the route key $K$ (seen as floating
+point number between 0 and 1 with period boundary). The question then
+becomes one of trying affect the randomized assignment of identities
+such that this becomes an efficient way of routing. The method we have
+chosen for doing this is described in detail in [ref]. In a nutshell,
+nodes start with randomly selected identities, but switch with
+each other so that nodes that are ``close'' in the network topology
+also have nearby identities (it is only when this property holds that
+greedy routing makes sense in anything but the final step).
+
+Ideally, one would hope to be able to assign the identities so that
+every step in a route for $K$ brings us to a node whose identity is
+close to $K$ than the last. In practice, this may not always be
+possible, but we still use this as heuristic to show us when to
+terminate a route. Currently the route continues until $m$ steps have
+gone without finding a node closer to $K$ then the best seen so far. A
+value of $m = 10$ is used, which is motivated by attempting to balance
+a thorough search while limiting resource usage.
+
+
+\section{Redundancy and Fault Tolerance}
+
+\section{Joining and Leaving the Network}
+
+\bibliographystyle{unsrt
+}
+\bibliography{../tex/ossa}
+
+\end{document}
+
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: t
+%%% End: 


Reply via email to