On Friday 04 September 2009 05:48:33 Evan Daniel wrote:
> After some discussions with Matthew on IRC, I've started making an
> attempt to gather meaningful performance statistics on the live
> opennet.  After some consideration of what the minimal set of useful
> stats gathering to add was, we decided on histogram data on accepted
> incoming requests (ie remotely originated only) grouped by hops to
> live.  Toad graciously added some stats collection, which you can see
> on the stats page in the latest testing build.  Displayed is the count
> of incoming requests for each htl value, along with the counts of
> requests that succeeded locally and succeeded remotely (meaning the
> request was forwarded onward and then succeeded downstream).  This
> data is presented for both CHK and SSK requests.
> 
> A sample line:
> 18    57.711% (690,818,2613)  2.678% (50,69,4443)
> 
> In order, that's htl for this line, CHK overall success rate, local
> success count, remote success count, and total count, followed by the
> same stats for SSK requests.  Note that the data does not include
> requests we received and rejected due to eg overload or loops.
> 
> My hope is that, in the medium term, I can develop statistical methods
> to meaningfully evaluate Freenet performance in the real world, rather
> than merely in simulation.  A number of significant changes have been
> made, and more are planned, that should have an impact on performance
> (routing, data retention, etc).  However, we have not applied a
> scientific approach to evaluating the impact of these changes, largely
> due to concerns that most data that is easy to gather is horribly
> noisy, and therefore difficult to draw conclusions from.  Short term,
> I hope to learn more about Freenet's operation as a network with
> emergent properties; in the medium term, I hope to be able to evaluate
> the performance impact of changes to routing, caching, and network
> topology.
> 
> (An aside before I get to the real analysis: CHKs have a good success
> rate; over 50% at high htl.  This is actually fairly good, considering
> that I suspect the data is heavily skewed by re-requests for queued
> data that will take many tries to find on average.  That is, I suspect
> the success rate on first requests for CHKs is significantly higher
> than the data can convey.  SSKs have an abysmal success rate, but the
> vast majority of SSK successes occur on high-htl requests.  From this
> I conclude that SSKs are actually quite reliable, but that the
> majority of requests for them are for things like Frost or FMS
> messages that have not yet been inserted.)

My peaks tend to be more like 30s or 40s percent. But even that is much better 
than expected based on the overall figures we've been using in the past. It is 
encouraging.
> 
> This data is awkward to work with for a variety of reasons; I think
> there is actually quite a lot I can do with it, but teasing it apart
> will take some care.  For example, the probabilistic htl causes some
> weird effects.  The number of requests that traverse at least two
> nodes should be strictly less than the number that traverse at least
> one node (on average; we get a random sample and so might not see this
> always).  This is not reflected directly in the data because of
> probabilistic htl: requests spend on average two hops at htl=18, but
> always spend only one hop at htl=17.  htl=1 exhibits a similar
> behavior.  One would also expect to see higher global success rates at
> htl=18 than htl=17, both because more nodes still remain to search,
> and because more of the requests remaining at htl=17 are "hard"
> requests.  However, observer bias muddies the data: a request that
> succeeds at the first hop will be observed by only one node (at
> htl=18).  A failing request, though, will be observed by an average of
> two nodes at htl=18.  So observer bias means that the observed global
> htl=18 success rate will be lower than the actual rate.
> 
> Before I attempted to draw any conclusions about success rates, I
> decided to examine the simple histogram of total requests vs htl.  I
> collected three sets of data, all from my node.  (Complete raw data
> can be found at the end of this email.)  I then performed a simple
> chi-squared test to check whether the distributions match; they don't.
>  I can't actually give a p-value, as my spreadsheet exhibits an
> underflow.  The result was a chi-square statistic of 1926.5 with 34
> degrees of freedom.
> 
> Incoming request htl distribution varies across the samples.
> Plausible causes include varied time of day, varied local node usage
> (the data are for remote requests, but it might have an indirect
> impact), and varied local network conditions.  By far the largest
> variation between samples (as measured by contribution to the test
> statistic) comes from the htl=18 and 17 data.  In what would normally
> be very bad statistical practice, I tried removing those rows from the
> data.  At this point, statistical significance was at merely
> astronomical levels: a p-value of 1.31E-48 was obtained.  Given this
> extreme a p-value, I am confident that corrections for performing
> multiple tests on the data, and peeking at the data while performing
> those tests, still leave a result that is highly significant.  (I have
> not actually performed said corrections.)
> 
> >From this analysis, I conclude that gathering data that is
> statistically useful and free of confounding factors will take some
> effort.  I think the appropriate collection technique is to gather the
> same basic data, but to group the samples into hourly sampling
> intervals, and collect data across several nodes.  This would help
> control for time of day effects and give some idea of how much node to
> node variability exists.  In order to control for varied local usage
> patterns and their effects, I think the number of local requests
> originated during each hour should also be recorded, along with the
> number of external requests rejected (both CHK and SSK for each of
> those).

It looks to be a good deal more feasible than gathering useful request time 
data, which is ludicrously noisy and does not generally produce sufficient data 
volume.
> 
> Comments on my proposed avenues for investigation would be much
> appreciated, as would volunteers to collect data.  I think I need data
> from a minimum of 5 nodes in order to confirm that there are not
> drastic local effects, though more might be nice after I've done an
> initial analysis.

We need to make this easy. At a minimum, recording data for complete hours with 
the UTC time to a log file which could then be pulled by some external 
mechanism. A more sophisticated approach might involve traceable volunteers' 
nodes automatically inserting their data to be polled...

All in all, very interesting! Ian decreed years ago that the age of alchemy was 
over, but since then it has crept in, largely because of the difficulty of 
getting any useful measurements out of what is a very chaotic network.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to