[Freenet-dev] HTL and Depth probabilities in the protocol

Oskar Sandberg Sat, 3 Jun 2000 16:19:44 +0200

First a warning, not only is the proposal for the protocol not finished, it is
not the protocol that is being used, or that has ever been used, either. It is
simply a version of what some of the developers think the protocol should look
like stemming from discussion here, and to some extent from the current
implementation.

As far as the 0.6 decrementation of the last HTL is concerned, this is one of
the areas that is not implemented yet. Currently, the node decrements the htl
all the time, even for the very last hop. As far as I know, Lee picked the
0.6 number straight out the air, so it is certainly up for discussion. I
think you may have to do a little better then "magical probability" in
justifying why 0.5 is better. 

Personally I am a little weary of adding probability here at all. Since
requests on Freenet are sent to unreliable partners, it is important that we
are able make a good analysis of how long a node should wait for a reply before
assuming that the last node the request was sent to was faulty for some reason.
Adding anything like this probabilistic decrementation increases the variance
of the time until a reply can be expected quite drastically, making the request
process slower when faulty nodes are encountered.

I'm even more skeptical to probablistic incrementation of the Depth. That field
is indeed a bit of a security issue, but the whole reason it exists is so that
nodes can be sure that they can give the response a large enough HTL value. If
you put yourself in a situation where the htl of the response has to be
guessed anyways, then you might as well not have a depth.

I'm also not sure where you got the number (1/(2*3))^r for the probability. If
it increments from 1 to 2 with a .5 probability, then probability for Depth
being k less then the actual depth of the message would be (.5)^k+1 , AFAIK.
Where to do you get the 3 from? (not that it matters, just add 1 to your
suggested r value).

Regarding how often the DataSource should be reset, I agree that the suggestion
taken up in the spec doesn't make much sense. But on the other hand half the
time is too often. Forgeting for a second that a reply message has to start
somewhere, the amount of hops away that the DataSource points will be
geometrically distributed with an expected value of one divided by the
probability used. I think having the DataSource point only two steps away one
average will not result in enough "path compression" for us the get the ideal
performance out of the network. What the value should be is hard to say,
hopefully we will be able to do simulations to test these things soon - my
guess would be that 3-4 should be about correct if we target that requests
should take 6-8 hops at most.

I also support the suggestion that nodes which are currently serving below
their desired capacity can reset it more often, while nodes that are beginning
to feel bogged down should be able not reset it less often or not at all. That
sort of load balancing seems to fit very well with the nature of the network.

Clients look like transient nodes to the nodes they connect to. Having a bunch
of bad references in nodes stores is not worth the very weak pseudo anonymity
of having them look almost just like a normal node - someone really wanting to
check could simply make a request to the DataSource address anyways. People
should run their own nodes, or use a trusted parties node for the first step.

As for having the Meta-data within the values hashed for the CHK - for CHK
indexed data it obviously will be. The two way hash will allow for the validity
for the meta-data to be checked without having the entire rest of the data
either.

On Sat, 03 Jun 2000, Chris.Studholme at canada.com wrote:
> Hi,
> 
> I'm new to the freenet development effort, but I consider this project to
> be very important for the sake of freedom on the internet, so I wish to
> contribute, eventually code, but for now I'll criticize.  
> 
> I'm not sure if version 1.0 of the protocol as published on the web site
> is the most up-to-date version.  If there is a more recent version
> available, could someone please point me at it.  
> 
> I would like to know why it was chosen in sections 3.1.3.2 and 3.1.3.3
> that HTL and depth should not be updated with a probability of 40%.
> Obviously, 0% is wrong and 100% is very wrong.  Without any further
> knowledge of the optimum value, I would pick 50% as the optimum value to
> quote in the spec (obviously node operators can ignore the spec, but it
> should be in their best interest not to).  I know a little about
> information theory and consider 50% to be a somewhat magical probability.
> I'm worried that someone might find a statistical attack on freenet that
> depends on the asymetry of 40/60.  Unless someone has some proof that
> 40/60 is closer to being optimal than 50/50, you should quote 50/50 in the
> spec.  If I can think of a better reason why, I'll post it here.
> 
> Related to these probabilities is the issue of how to initialize HTL when
> replying to a request with Data.Send (section 4.4.1).  Ignoring for the
> moment the recommended addition of a random number to the Depth of the
> request, there is a certain probability that the Depth of the request is
> too small.  Offsetting this is the probability that the Send.Data reply
> will outlive its HTL.  I figure the probability of a Send.Data message not
> surviving all the way back to the originator of the request is 1 in 3*2^r,
> where r is a small constant added to HTL when initialized (I used the
> 50/50 probability described above).  Therefore, to ensure that no more
> than 1 in 3072 messages, say, fail to make it back, choose r=10 (ie. HTL =
> Depth from request + r).  r=10 is just a suggestion here.  A small random
> constant can also be added in to obscure the source of the data.  Unless
> I'm misunderstanding something, this information should appear in the
> spec.
> 
> Another concern is with section 4.4.2:
> 
>     If it does choose to locally store the data, it may wish to replace
>     the node address in the DataSource with its own. Nodes should do this
>     with some random probability to help disguise the original source of
>     the message.  One suggested way to do this is to replace the
>     DataSource value with the persent node's address with a probability of
>     1 divided by the value of the Depth header field.
> 
> I have a suggestion.  How about nodes replace DataSource with their own
> address with a probability of 50%?  It should be the goal of freenet to
> not propagate the source address of the data too far.  With a 50%
> probability you get a 50% probability of propagating beyond the first
> node, 25% beyond the second, 12.5% beyond the third, and so on.  There has
> been talk about nodes temporarily replacing DataSource headers with their
> own adress with probability 1 to bootstrap themselves into the freenet.  
> 50% should be enough to get most nodes involved quickly so this
> bootstrapping idea shouldn't really be necessary.  
> 
> I'm not sure, but I suspect the current protocol works differently for
> nodes vs. clients.  For improved anominity, clients should appear to nodes
> as other nodes (I realize people are encouraged to run their own nodes,
> but that may not be enough).  This idea has an obvious problem steming
> from the fact that clients are transient and nodes who store a clients
> address as DataSource will later have problems trying to contact the
> non-existant node.  Not propagating DataSource too far as described above
> would help with this problem.  What do others think about this?
> 
> The final issue I would like to raise is concerning meta data.  I think
> all meta data (public and private) needs to be protect by the CHK and
> nodes should be discouraged from passing any meta data not protected
> within the CHK.  I'm worried that without this condition, meta data can be
> added, altered, and deleted by nodes at will as the document is passed
> around.  The only meta data (outside the CHK) passed along with a document
> should be the CHK itself, the document size, and DocumentSource (limited
> as discussed above).  All other meta data should be appended by the
> document creator and rendered invarient from then on.  If this is already
> the case, just ignore me as I misread the spec.
> 
> I hope I'm not out of order here.  I'll try to post in small chunks 
> in the future.  Thanks.
> Chris.
> 
> 
> 
> _______________________________________________
> Freenet-dev mailing list
> Freenet-dev at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/freenet-dev
-- 

Oskar Sandberg
md98-osa at nada.kth.se

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] HTL and Depth probabilities in the protocol

Reply via email to