On Sep 28, 2009, at 11:15 PM, Raistmer wrote:

>>
>> The example I used in the past is this.  SaH is basically a signal
>> hunter.  When was the last time that a test work unit with known
>> signals in the input data was subjected to analysis? If anyone who
>> reads this board knows this they have not yet answered the question.
>> All the testing I know of is to use a task of real data that we  
>> assume
>> we know what is in it because we have run it through the software.
>> And because the answer today matches the answer of yesterday we  
>> assume
>> that the software is correct.  Unless the software of yesterday was
>> bad .. then we are just making today match yesterday's bad analysis.
>>
>
> It's a good point. Maybe Eric could answer if such checks were  
> performed in lab before releasing SaH application.
> But anyway, such testing should be done only once per algorithm  
> change, it should be done in project's lab and not on participants  
> PCs.

Here is where you and I disagree because I am looking at a broader  
picture than just in lab proofs.  For one thing though we prove that  
the software works on the lab computer with your limited tests, and  
alpha test you neglect that well proven need for the broader beta  
test.  And because that external environment is always in flux I want  
to continue to test in those board conditions to see how SP affect  
things, video drivers, and all that other software that we are now  
pretending does not have an effect ... except it does ... GPU Grid is  
having no end of troubles because of issues with the Drivers / CUDA  
version and possibly with the GTX260 cards or maybe their BIOS  
software (or whatever they call it)

> Validity of algorithm used is fundamental question of course, but  
> should be solved BEFORE app goes into public.
> Cause such calibration tasks should calibrate Validator in first  
> place.

Again, you are assuming that a single test or even test suite will  
catch all conditions.  I have talked about several potential  
situations where the validator will be happy with two agreeable and  
wrong answers.  It is entirely possible that I am wrong that there are  
problems out there ... I don't think so because if there is one thing  
I have been able to do is to monitor the boards and what I see does  
not give me confidence that we have stable systems... but at the  
moment we do no testing to validate what we "know" is true.  But if  
you are not measuring your error rate you have no idea what it is ...

More interestingly we do not know what effects other software  
components may be having on BOINC Science Applications while they are  
running ... I proved earlier this year that Trak issue #6 (I think it  
is) that refers to the "Heartbeat" problems (and rated "Critical") is  
still alive and well and can allow IBERCIVIS and some other project's  
tasks to cause tasks for other projects to crash ... Bottom line,  
software interacts in strange ways that are not predictable in the  
lab ...

> Then, being calibrated Validator can do same calibration work on  
> user returned results deciding what result is good and what is wrong.
> There is no need for enormous resource waste running such  
> calibration tasks on each and every PC joined to project.

Me personally, I would pay the price ... then again, I am an engineer  
and I love knowing the accuracy of what I am doing.  Were I running a  
project you would be, if you were attached, running calibration tasks  
even if you did not know it ... because, if I was not measuring my  
error rate how could I present results with a straight face saying  
that I know what they mean?

Running the calibration tasks on each PC is not needed necessarily to  
test the speed of the PCs I agree, but if we don't forget the other  
parts of the point of this ... and that is to increase our confidence  
on the machines that are calculating our results.  We can debate the  
numbers of machines tested needed to obtain higher confidence and with  
experience that number most likely will drop over time ...

I love the terms applied to the proposal when we have not even agreed  
upon the rate of tests or even the extent of the allowable opt-out or  
any other details ... "ENORMOUS"?  How do you know it is enormous?  If  
we allow opt-out it is zero ... though the usually specified system  
used to justify rejection runs one SaH Task a week ... which means one  
CUDA system can run off the production of that systems contribution in  
an afternoon ...

But the real question is that how large is our hubris that we know the  
quality of our answers, when we have bothered to measure so little in  
our processes?

To continually say that the validators will proof the answers then you  
have to prove that the validators have never made a mistake ... bug  
fixes to the validators prove that they are not infallible ...

> Your approach in this part could be compared with such situation:
> I refuse to use ruler, I always wanna compare all lenghts I need to  
> measure with meter etalon.
> What would be with geometry if each and every measurement could be  
> done only after direct comparison with etalon of meter ?....

Walk into my web said the spider ... sorry, but this is contained in  
the proposal and matches standard calibration techniques.  and the  
answer is that not all the extant PCs would be calibrated against the  
top level standards ... but once a PC was calibrated against the  
primary standard it can be used to calibrate standards to level 2 (in  
that it would be a level one standard calibrated against the primary  
standard) ... level two standards would be able to calibrate level 3  
and there is where I would stop ... in effect you have a tree with a  
broader spread at each level ...
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to