On Oct 1, 2009, at 2:35 AM, Raistmer wrote:

>> programming the task will rotate.  Since we know what the answer   
>> should be we are validating the ENTIRE PROCESS from one end to the   
>> other ...
> Ok, you once again described how it should be.
> But I still can't get why it is needed. This is the biggest  
> difficulty with your proposal.
> I and probably others just don't see the reason to do this, not how  
> or could it be implemented or not, just why it should be implemented  
> at all. That is the problem.
> What such entire validation (please, take into account that your  
> validation can be breaked just on NEXT TASK, it was pointed few  
> times already) give us new ? "Entities should not be multiplied  
> unnecessarily" as Occam's razor tell us.  What part of current BOINC  
> validation structure it can replace?
> No one IMO. Well, what part of BOINC validation system it could  
> improve being ADDED to current validation system? My answer the  
> same, no one. Could you specify what exactly part it can improve,  
> taking into account possibility of random errors (I see no  
> accounting for  this aspect in your posts still).

There are always random errors.  Also no system of testing is going to  
catch all problems and issues that occur between tests will not be  
identified until the next test.  The answer to those questions and  
issues is not to say, don't test at all ... it is to figure out what  
is the minimum we have to do to get the optimum results.  I have  
talked about the errors in the various posts over time but you are  
correct that I have not talked about them in a concentrated way and to  
be honest I am not going to spend much time here because it is pretty  
clear and obvious to me that minds are closed.

But, there are issues.

The FP number systems we use are approximations.  The hardware we use  
to make our calculations are designed to a standard.  However two  
computers both running correctly with the "same" software can and do  
return different results simply because the IEEE 754 standard defines  
modes of operation but leaves the selection of those modes to the  
programmer.  Most of the project scientists probably don't even see  
the implications of those statements I have just made.  But, for  
example, Virtual Prairie is possibly suffering from just that effect.   
Net result, same program, different results when run on 64 bit vs 32  
bit windows. Probable culprit is either the selection of the rounding  
mode or the truncation of 80-bit internal values to single or double  
precision.  That is my guess.  Though it is possible that they are  
seeing:

Milky Way has had extensive problems with the cannibalization of  
precision that occurs because FP numbers are, after all, only  
approximations ... with sufficient numbers of iterations loss of  
precision can devastate the accuracy of a model.  I worked with a  
mathematician and in less than 10,000 iterations we had lost precision  
on SP to less than 3 digits.  Careful algorithm work and fancy  
footwork and they have it under control ...

SaH has historically some issues with tasks from different platforms  
though most don't recall some of those times, but, to get from here to  
there mostly what they did was reduce the strictness of the validation  
to something that was a lot more fuzzy ...

Then we have the FDiv class errors where large swaths of the computer  
world has calculators that return consistent results that are just not  
correct.

Cross platform, well, suffice to recall the Cray ... faster than snot  
but the FP system was sloppy and inaccurate ... but it was fast and  
good enough to get work done... but getting values that compare from  
68000 and G5 and G4 and Intel 8086 class to all agree ... ugh ...

Libraries, compilers, AMD vs Intel, ATI vs Nvidia, CPU vs CUDA (or  
ATI) ... again I can go on and on about all the places where we have  
differences ... differences generate problems ...

Now, the glib answer is that error is the responsibility of the  
project.  Though no one has really tried to make that case and prove  
that it is only the project's responsibility ... because you  
cannot ... it is the project's responsibility to manage the error  
rates and to know them so that the research can proceed on a sound  
basis ... but it is properly BOINC's responsibility to do the work  
because if something is being done of the same nature by two projects  
then it is the proper responsibility of BOINC to provide the tools and  
technology to manage that aspect.  Just as we don't require each  
project to invent their own database why would we require them to  
build the same infrastructure to manage systemic error detection and  
management?  The whole point of BOINC is to be the middleware to  
handle common issues ... as an aside it fails (IN MY OPINION) a lot  
more than it should for any number of reasons but shows up mostly in  
the fact that if you watch multiple project you will see that the  
project teams spend large amounts of time solving the same problems  
that have occurred on other projects ...

Case in point ... those posts recently about I am having problems  
getting my project started ... were and are pretty much the same  
questions that the last group had 6 months ago had ... and the group  
before that  ... and the group before that ...

Again, a very incomplete answer but here goes (again) ... :)

SaH is my favorite example in part because it fits in with my  
technician background ... but in essence it is a radio, and the end  
result we are looking for a radio station playing our song. But, we  
don't know what our song even looks like ... so, what is wrong with  
doing multiple things with that preverbal one stone?  If I start with  
a known artificial signal, in the simplest form, a single pulse.  And  
I inject it into one node in our network I will process it and get  
back and I can see if the system did detect the one pulse as  
expected.  IF so, as Martin correctly stated we have one end to end  
test ... now, do that over a bunch of nodes, at this point we can  
start to answer several questions ... are they all detecting the one  
pulse and if so what are the limits of error on that return.  There is  
no noise so the answers should be identical... but they won't be ...  
different OS, different processors, different compilers, on and on ...  
but now we can start to determine where are the sources of these  
errors ... and oh by the way isolate the machines that are returning  
just bad answers ... the side effect is that we have also done the  
benchmark.

Again, this is woefully incomplete, and the nay sayers have seized on  
enough trivia and what ever ... computers are good and wonderful tools  
but are never to be trusted ... sadly the mindset in BOINC-World is  
they are never to be doubted.

There is an engineering axiom/rule that talks about known knowns and  
the like... the MSM made fun of Rumsfeld for that ... but the point of  
that whole thing is that in the end the only thing that gets you in  
trouble are the unknown unknowns ... and this is an attempt to start  
to probe that areas to see if there are problems out there ... saying  
that there aren't doesn't mean there aren't ...and proving the  
negative case is impossible ... so ... that is the point ...

Again, this is an incomplete gloss over of the conceptual things that  
are to be looked for in that dark area of unknowns ...

>> is needed to establish the operational speed of that machine in  
>> CS ... note that the point is to establish this with more  
>> reliability with  real work on real machines over real execution  
>> times so that the  instability of the benchmarks is eliminated as  
>> an issue.
> Yes, running many of "almost" real tasks will improve credit  
> estimation.
> But :
> 1) at too high overhead to be valuable.
> 2) the same could be achieved by usage of REAL tasks and set of  
> reference PCs.

Which is why I suggest a suite.  The point being that this is not just  
about one thing.  We are doing this change to achieve multiple goals.   
ONE of which is improving the benchmark accuracy so it is meaningful  
which then allows us to establish a stable tie between our benchmark  
results and credit awards and eventually eliminate the disparities of  
credit awards across projects.  Why should I get 50 CS on project A  
for an hours effort when project B grants 75 for the same time.  Why  
should I get 15 when my claim was 35 ... I would use some generated  
samples because the point is to prove that the end-to-end system is  
doing exactly what it should ... some real world samples because,  
well, they are real world ... and they are always qualitatively  
different than artificially generated signals ...

>> My personal  feeling is that using other mechanisms we can fill in  
>> the gap and the  current benchmark can be eliminated ...
> Yes, your type of benchmark can replace current benchmark, but will  
> have no valuable benefits and will have just bigger overhead.

Which is because there has been so much negativity about one aspect or  
another without considering the whole.  If you want, and some have,  
you can harp on the fact that the "benchmark" task will take as long  
as a real task and that this is "waste" if you insist on ignoring the  
other purposes served.  Sadly this is exactly what has happened.   
Contrary to the implications I want to spend as little of the  
collective resource in doing this activity as the next person.

>> The point is that instead of requiring the counting of FLOPS or  
>> Loops  or anything else we establish a generalize earnings for a  
>> specific  computer using a collective of work.  The more different  
>> work loads we  use the more "accurate" our estimate.
> Again, right in base part, indeed, if many particular estimation for  
> each project will be replaced by single and more good established  
> estimation cross-project credit parity will be improved.
> But again, reference PCs look more preferable for same task than  
> reference work.
> They will achieve the same results but with less overhead involved.
>
>
>> We make the assumption that the validator will catch errors... yet  
>> we know that the validator is a program written by people.  The  
>> point is that if I make a SaH signal and the program returns 15  
>> signals there  is a problem somewhere ... yet if that bad answer is  
>> paired up with  another back answer that is the same the validators  
>> will accept both  answers. And more and more projects are going to  
>> adaptive replication  and validating on one task... so the idea  
>> that the redundant computing  is going to catch errors is slowly  
>> being eroded ...
> 1) I'm against adaptive replication and consider adaptive  
> replication experiment on SETI beta totally messed up.
> Adaptive replication lowers our faith in result validity indeed. It  
> suffers just from same flaw as your calibration tasks idea - it  
> can't account for random error or more or less fast change of  
> conditions.
> CPU/GPU fan in absolutely trusted PC can eat too much dust... and we  
> will have the same host returning invalid results. How soon we could  
> catch this with calibration tasks or with adaptive replication? Not  
> very soon. And the faster host is, the more invalid work it will  
> produce before catch.
> If and ONLY if project can accept lowering of result correctness  
> value in exchange of increasing (please, note, this is trade-off -  
> we pay precision for speed) performance, adaptive replication could  
> be used.
> And surely if project goes to such measure it needs all power it  
> could get and will not waste its fraction to calibration tasks.
>
> 2) I get your point. Please, try to get my.
> If I will make such signal and discover 15 signals instead of one in  
> result file, I will immediately post this or E-mail to Eric. Then he  
> could reproduce this issue and take measures, including bug-fixes in  
> validator. No need to involve all participants PCs in that. I don't  
> argue tests and calibrations unneeded, they needed of course! But  
> they needed just ON ANOTHER LEVEL of hierarhy.

I have it ... and if all participants were trustworthy then there is  
no need to check up on them ... though even Reagan said "Trust, but  
verify"

Besides, a million people calling Eric?

>> THIS IS NOT THE COMPLETE PROPOSAL ... there are a myriad of   
>> details ... but I know that if I make it longer no one will read   
>> it ... but this is the core ...
> Well, details usually go after basic idea approval, no probs with  
> that.

No the arguments are, historically, about trivial details or  
hyperbole ...


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to