[freenet-dev] (no subject)

Zooko Thu, 14 Aug 2003 11:41:37 -0700

To: [EMAIL PROTECTED] and mnet-devel

[Note that replies sent to mnet-devel will be held by the mailing list software 
pending my approval.  I will approve *all* replies which are not about penis- 
enlargement or Nigerian millions, and I'll also add the sender's e-mail to 
the "automatically approved senders" list.]



Hi there!

I'm Zooko, coordinator of the <a href="http://mnet.sf.net/";> Mnet project </a>.  
You might be interested in these sections of our FAQ:

http://mnet.sourceforge.net/faq.php#diff-Mnet-Freenet
http://mnet.sourceforge.net/faq.php#diff-Mnet-Freenet-2

Ian Clarke asked me to comment on the NGrouting scheme.  We've had similar 
problems, and we've developed similar but not identical solutions for Mnet.

I'm reading <a href="http://freenet.sourceforge.net/index.php?page=ngrouting";> 
this doc </a> and I have a few questions and comments.

Note: my motivation in writing this is *not* to proclaim that Mnet is better 
than Freenet.  It is to help both of our development efforts by learning from 
the other.  Nonetheless, the fact that I do direct comparisons of specific 
techniques from our respective projects, and that I point out potential 
problems in Freenet's design, will probably trigger the "us vs. them" reflex 
in your brain and engender strong emotions.  Please be polite in your 
response, and I'll do the same for you.


Some questions:

 * What does "must be efficiently serializable" mean?

 * How are queries routed *alternatively* in order to attempt to bypass a Data 
   Not Found ("DNF") event?  (I'm especially interested in this question, as 
   I haven't figured out how I want to do it in Mnet yet. ;-))

 * When do Freenet nodes establish links to newly discovered peers (discovered 
   through DataSource information)?  Are Freenet nodes expected to use 
   NGrouting measurements to choose when to link to new peers?


Some comments:

Unfortunately Mnet's solution to this problem is not yet nicely documented the 
way the NGrouting design is.  However, Mnet's approach is simple enough that 
I can relate it while discussing Freenet's.

It seems like there are two components of the algorithm which are separable 
from one another, both in Mnet and in Freenet.  One is how to collapse the 
various kinds of measurements of performance -- latency, throughput, hit 
rate (== 1-DNF rate), connection-failure-rate -- into a single scalar.  The 
other is how to combine multiple such scalars collected through history into a 
single scalar prediction of which of your peers will likely perform best on 
the query that you are about to send.

As to the first -- summarizing measurements into a single scalar -- Both Mnet 
and Freenet do this, although it isn't the only option -- one could have a 
more complicated system that keeps these measurements as separate scalars.  

Mnet's current algorithm is simpler than Freenet's, and comparing them might 
be useful.  Mnet's algorithm is to store two numbers: the historical hit rate 
and the historical turnaround time (the time elapsed between initiating the 
request and getting the resulting data).  If an Mnet node initiates a request 
and it does not get the resulting data, then this has no effect on the 
historical turnaround time, but it lowers the historical hit rate.

Now the way these two values are combined is the observation that the inverse 
of latency is rate.  That is, if X is the average number of seconds you wait 
to get a block of data, then 1/X is the average number of blocks of data you 
get per second.  If Y is the average hit rate -- the number of requests which 
yielded the correct data divided by the total number of requests -- then 
(1/X) * Y is the average rate of blocks you received per request you sent.  
(It was my wife, Amber Wilcox-O'Hearn, who pointed this out to me.)

So Mnet combines all kinds of failure (stored in hit rate) and performance 
(stored in average turnaround time) into a single scalar: (1/latency) * hitrate.

As I said, this is simpler than Freenet's technique, which keeps track of DNFs 
and connection failures separately, and attempts to heuristically separate 
"false miss" DNFs -- where the data is present in Freenet, but the node didn't 
find it -- from "true miss" DNFs -- where the data isn't present in Freenet.

One potential problem with the way it does that is that it rests on the 
assumption that true DNFs -- queries for data that is absent from Freenet -- 
is evenly spread throughout the keyspace.  I can imagine cases where that 
assumption becomes untrue, even for extended periods of time or for 
significant fractions of the keyspace.


Okay, on to the "history and combination" part, which is where NGrouting gets 
*really* sophisticated.

If you haven't already read the NGrouting doc, you probably ought to before 
reading this.

Mnet's history and combination is again relatively simple.  The "history" part 
is accomplished via an exponential fall-off filter:

historicalvalue := historicalvalue * 0.9 + newsample * 0.1

You can of course replace "0.9" and "0.1" with other constants of your choice, 
provided that they sum to 1.0.  It's actually slightly more complicated than 
this, since we store variance as well as mean, but you get the idea.

This is similar to the *vertical* component of the NGrouting scheme, where "0.1" 
is similar to the "forgetfulness" level.

The "combination" part is trivial in Mnet: the entire keyspace is summed up 
into the single "latency" and "hitrate" scalars.

Obviously this can work only because the job of partitioning the key space is 
separated from the which-peer-to-route-to question in Mnet.  (That job will be 
performed, in Mnet v0.7, by a separate mechanism as part of the DHT-like 
routing.)

The two-dimensional approach Freenet uses is interesting, and as far as I can 
tell it is a reasonable way to combine the keyspace-related performance 
measurements that Freenet wants.

I'm eager to find out how it works in practice.  Unfortunately I don't think 
it will be easy to make any solid observations of how it works, especially the 
sophisticated and complex bits like the influence of the minimal-DNF 
subtraction and the two-dimensional history/combination technique.  It will 
require some really good simulations or really good self-monitoring mechanism 
to provide a clear enough picture for us to judge the impact of those details, 
in my opinion.

There is one funny thing in the two-dimensional history: the horizontal motion 
of the ten points means that the impact of a new sample is different depending 
on whether a point has already been dragged across that X position.  I mean: 
consider a new sample comes in at X=0.25, and two of the ten points need to be 
updated.  Well, if this is a pristine set where the first point is at X=0.1, 
the second point is at X=0.2, and so on, then the second and third points are 
the two that will be moved toward the new sample.

However, if a lot of samples have previously been registered at 0.21, which 
have "dragged" the third point all the way over so that it is currently at 
0.24, then this new sample at 0.25 will affect the third and *fourth* points.

This isn't necessarily bad, but it was a little surprising to me when 
I realized it, and it is a way in which NGrouting is not scale-free.  That is: 
suppose there is a small Freenet where one node receives requests which are 
evenly distributed across 0.4 of the keyspace.  Suppose there is a large 
Freenet where one node receives requests which are evenly distributed across 
0.004 of the keyspace.  The former will move 5 of the points after a few dozen 
new samples have come in.  The latter will move only 2.


Okay, that's all I have to say about the details of the algorithms.


Regards,

Zooko

http://zooko.com/


------- End of Forwarded Message

_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

[freenet-dev] (no subject)

Reply via email to