> Here is where you and I disagree because I am looking at a broader > picture than just in lab proofs. For one thing though we prove that > the software works on the lab computer with your limited tests, and > alpha test you neglect that well proven need for the broader beta > test. And because that external environment is always in flux I want > to continue to test in those board conditions to see how SP affect > things, video drivers, and all that other software that we are now > pretending does not have an effect ... except it does ... GPU Grid is > having no end of troubles because of issues with the Drivers / CUDA > version and possibly with the GTX260 cards or maybe their BIOS > software (or whatever they call it) If one of participants computer have problem, no matter for what reason (it's should not be tsask of project - to investigate why this PC has problems) it will return invalid task. If PC's problems still allow it to return valid tasks, from this particular project point of view this particular PC HAS NO PROBLEM. It's pretty simple. Will problem be in drivers, or will it be in another project running on the same PC, will it be because of voltage drop in power block or OCing, or gamma-rays and so forth - IT JUST DOESN'T MATTER. All that matters, will this particular PC return valid task or not.
> Again, you are assuming that a single test or even test suite will > catch all conditions. I have talked about several potential > situations where the validator will be happy with two agreeable and > wrong answers. It is entirely possible that I am wrong that there are > problems out there Only to some degree, of course. Unfortunately NO ONE, NO TEST AT ALL can warrant validity of computations. Simple example. You run calibration task, it ran just OK. Then PC recived new task and it's CPU get gamma ray beam hit, voltage drop or just room temperature increased little more... and CPU did incorrect addition. So what? Was your calibration task needed? Absolutely not. It just created false impression about next result validity. Next, you totally correct about not 100% validity of results if they pass validator. Completely agreed with that. Morever, I've seen such situation by myself when initial buggy CUDA MB gave overflow (-9) task incorrectly. When 2 of such GPU cards met invalid result gone to science database. Unfortunately, there is no means to ensure 100% task validity anyway. all we can is to verify our application in broader range conditions - that's beta projects job. Main project should implement result redundancy to involve statistic to help with result validity. Again, IT NOT POSSIBLE TO GET 100% VALID result. With calibration tasks too. Just because some of errors have random nature. Moreover, if project can't tolerate some degree of error it's not scientific project (well, maybe mathematics will not agree but all physics completely aware that input parameters for our models are known only in some error boundaries. That is, error INHERENTLY PRESENTS in any computation, even if computaion itself was carried absolutely correctly with infinite precision (it's impossible BTW too ;) ). > More interestingly we do not know what effects other software > components may be having on BOINC Science disagree. We know that. If PC with broken driver or BIOS or just running another project will crash tasks, validator will refuse them. Again, yes, it's possible just with TOO small probability in case when app itself correct (and it's aim of beta to prove it) that 2 different PCs will have identical error. As long as there is many different PCs with different software/hardware configs statistical approach with result redundancy will do its work well. > Bottom line, > software interacts in strange ways that are not predictable in the > lab ... Sure, it's posible. That's why redundancy mandatory IMO. In this aspect it would be good to have some set of reference PC doing ordinary (not test task but just usual work to not waste electricity ) job, but in much more controlled conditions than at average participants PC, to help with result validation. This Martin's idea about etalon PC could improve our faith (sorry, only faith possible, it's a way all natural sciences live) in result validity, not only make result rewarding more senseful. >.. because, if I was not measuring my > error rate how could I present results with a straight face saying > that I know what they mean? Error rate should be measured, no objections in this part. Objection only in part that this measurements should involve running artifical tasks on all participants PCs. It's just unneeded for this task. It can be done, it will give good result, but not better than if it will be done in lab. So, it's just unneeded. > I love the terms applied to the proposal when we have not even agreed > upon the rate of tests or even the extent of the allowable opt-out or > any other details ... "ENORMOUS"? I've said "enormous" basing on next consideration: There are MANY different participants hosts. So ALL that involve to be run on each of hosts will have enormous computational demands, just because number of hosts is huge. I suppose you don't propose to run single integer addition on each host, right ? ;) > How do you know it is enormous? Answered above. >>If > we allow opt-out it is zero ... If zero, you assume all will be opt out ? Then why bother with this at all ??? Any opt-out will be implemented only by small fraction of participants, you should know that perfectly. Only one who knows about something can use it. In this particular way, only one who aware about this option will be able to disable it. > But the real question is that how large is our hubris that we know the > quality of our answers, when we have bothered to measure so little in > our processes? Hm, why so negative epithet? First of all (it was already answered in this discussion): it's PROJECT RESPONSIBILITY to provide valid application. NOT BOINC's responsibility. BOINC can provide some tools to help project to check validity of course (result redundancy one of such means, validator template in server part - another one) but it just can't do this instead of project. So your question goes to project scientists. Of course most of them don't absolutely ensure about algorithms used. They just work with the best they have at this moment. Look on MW project, they change their computational model pretty often. The same in SETI, in blanking algorithm part for example. We all know that currect approach imperfect, we know it already, w/o additional calibration tasks. But right now we have NO ANOTHER approach to deal with. Another approach in development now. So, we have 2 ways to go: either stop project completely until new approach will be implemented, or to continue with the best we have right now. Of course we chose second way just because we recive good results with current approach. They not perfect, they can be made better (and will be ;) ) but they good enough to do work right now. > To continually say that the validators will proof the answers then you > have to prove that the validators have never made a mistake ... bug > fixes to the validators prove that they are not infallible ... Sure, it's just real life. No one perfect, Validators included :) >> Your approach in this part could be compared with such situation: >> I refuse to use ruler, I always wanna compare all lenghts I need to >> measure with meter etalon. >> What would be with geometry if each and every measurement could be >> done only after direct comparison with etalon of meter ?.... >but once a PC was calibrated against the > primary standard it can be used to calibrate standards to level 2 (in > that it would be a level one standard calibrated against the primary > standard) ... level two standards would be able to calibrate level 3 > and there is where I would stop ... in effect you have a tree with a > broader spread at each level ... Well, then I see no difference with Martin's thoughts about etalon PC except just one: actually, no calibration tasks is needed constantly. We need just to characterize etalon PC (yes, by means of calibration/test/artifical tasks) then we will use it (group of such PCs actually) to calibrate validator, validator "calibrates" participants PCs (it does this work already). All that I wanna say: you don't need to go on last level, on user PCs, to prove higher levels of hierarhy. Of course you can have doubts in validator correctness. But this should be checked in lab, not on participants PC. They just don't needed for this. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
