On Sep 30, 2009, at 10:20 PM, Rom Walton wrote: > Paul, > > Have you read any of the BOINC papers > (http://boinc.berkeley.edu/trac/wiki/BoincPapers) lately? > > There have been more than a few research papers on how BOINC works, > what > failure rates one can expect, and how that compares to other > distributed > computing models. > > I recall reading a paper on how the host reputation system and > adaptive > replication system increased overall project efficiency without > incurring much if any increase in error rates. Problem is I can't > seem > to find it at the moment. Time for me to go to bed I suppose. > > In any case, these things are constantly being looked at. BOINC is > still a research project.
Yes, and I even have read some that the authors are now repudiating. their own conclusions. The problem is that for most of the discussion I have been ignoring, because it is not relevant to my proposal, those errors that can be and are detected by the validation process. Which is what those discussion have been based on, unless I missed something. The errors and classes of errors I have been talking about are those where, for example, two machines return the same identical output which passes validation but is incorrect, the number of detections of these errors would go up with increased redundancy but the trend in the BOINC system is to reduce redundancy and believe that these errors are negligible. The problem is that we have no idea how large this set is. It may indeed be trivial ... but science is not about saying an error rate is trivial ... science is about identifying a potential source of error and taking steps to eliminate that error, or at least identifying how large it is and determining if it is an issue. We have done neither. Which is another reason I have stated time and again that the initial test rate may be high but it is possible that once we have made the measurements we can reduce the overhead. As to that paper I was sure I saved a copy and I cannot find it either ... :( More importantly the conclusions in that paper merely point back to my question, what if the validator is not catching all the errors? Adaptive replication merely increases the total pile while reducing the accuracy until and unless a malfunctioning host returns sufficiently bad and sufficiently enough bad tasks that it is no longer eligible for AR and is marked unreliable. And on projects like SaH I suspect the problem is large even if the importance is less... The classic bad example is the over-clocker that refuses to believe that his system may be unstable because it fails to run Rosetta because it runs SaH "fine" ... Anyway the discussion point is still the distinction between a result and a correct result ... and correctness like quality cannot be imposed at the end of the process. If I can find my copy of that paper I will e-mail you a copy (or the link if I find it on-line)... Well, I am done with the topic it is obvious that I am pretty much alone, again ... familiar territory ... I just hope history does not prove I was right again ... _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
