On Sep 30, 2009, at 10:20 PM, Rom Walton wrote:

> Paul,
>
> Have you read any of the BOINC papers
> (http://boinc.berkeley.edu/trac/wiki/BoincPapers) lately?
>
> There have been more than a few research papers on how BOINC works,  
> what
> failure rates one can expect, and how that compares to other  
> distributed
> computing models.
>
> I recall reading a paper on how the host reputation system and  
> adaptive
> replication system increased overall project efficiency without
> incurring much if any increase in error rates.  Problem is I can't  
> seem
> to find it at the moment.  Time for me to go to bed I suppose.
>
> In any case, these things are constantly being looked at.  BOINC is
> still a research project.

Yes, and I even have read some that the authors are now repudiating.  
their own conclusions.

The problem is that for most of the discussion I have been ignoring,  
because it is not relevant to my proposal, those errors that can be  
and are detected by the validation process.  Which is what those  
discussion have been based on, unless I missed something.

The errors and classes of errors I have been talking about are those  
where, for example, two machines return the same identical output  
which passes validation but is incorrect, the number of detections of  
these errors would go up with increased redundancy but the trend in  
the BOINC system is to reduce redundancy and believe that these errors  
are negligible.

The problem is that we have no idea how large this set is.  It may  
indeed be trivial ... but science is not about saying an error rate is  
trivial ... science is about identifying a potential source of error  
and taking steps to eliminate that error, or at least identifying how  
large it is and determining if it is an issue.  We have done neither.

Which is another reason I have stated time and again that the initial  
test rate may be high but it is possible that once we have made the  
measurements we can reduce the overhead.

As to that paper I was sure I saved a copy and I cannot find it  
either ... :(

More importantly the conclusions in that paper merely point back to my  
question, what if the validator is not catching all the errors?   
Adaptive replication merely increases the total pile while reducing  
the accuracy until and unless a malfunctioning host returns  
sufficiently bad and sufficiently enough bad tasks that it is no  
longer eligible for AR and is marked unreliable.

And on projects like SaH I suspect the problem is large even if the  
importance is less...

The classic bad example is the over-clocker that refuses to believe  
that his system may be unstable because it fails to run Rosetta  
because it runs SaH "fine" ...

Anyway the discussion point is still the distinction between a result  
and a correct result ... and correctness like quality cannot be  
imposed at the end of the process.  If I can find my copy of that  
paper I will e-mail you a copy (or the link if I find it on-line)...

Well, I am done with the topic it is obvious that I am pretty much  
alone, again ... familiar territory ... I just hope history does not  
prove I was right again ...
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to