On Oct 1, 2009, at 2:35 AM, Raistmer wrote: >> programming the task will rotate. Since we know what the answer >> should be we are validating the ENTIRE PROCESS from one end to the >> other ... > Ok, you once again described how it should be. > But I still can't get why it is needed. This is the biggest > difficulty with your proposal. > I and probably others just don't see the reason to do this, not how > or could it be implemented or not, just why it should be implemented > at all. That is the problem. > What such entire validation (please, take into account that your > validation can be breaked just on NEXT TASK, it was pointed few > times already) give us new ? "Entities should not be multiplied > unnecessarily" as Occam's razor tell us. What part of current BOINC > validation structure it can replace? > No one IMO. Well, what part of BOINC validation system it could > improve being ADDED to current validation system? My answer the > same, no one. Could you specify what exactly part it can improve, > taking into account possibility of random errors (I see no > accounting for this aspect in your posts still).
There are always random errors. Also no system of testing is going to catch all problems and issues that occur between tests will not be identified until the next test. The answer to those questions and issues is not to say, don't test at all ... it is to figure out what is the minimum we have to do to get the optimum results. I have talked about the errors in the various posts over time but you are correct that I have not talked about them in a concentrated way and to be honest I am not going to spend much time here because it is pretty clear and obvious to me that minds are closed. But, there are issues. The FP number systems we use are approximations. The hardware we use to make our calculations are designed to a standard. However two computers both running correctly with the "same" software can and do return different results simply because the IEEE 754 standard defines modes of operation but leaves the selection of those modes to the programmer. Most of the project scientists probably don't even see the implications of those statements I have just made. But, for example, Virtual Prairie is possibly suffering from just that effect. Net result, same program, different results when run on 64 bit vs 32 bit windows. Probable culprit is either the selection of the rounding mode or the truncation of 80-bit internal values to single or double precision. That is my guess. Though it is possible that they are seeing: Milky Way has had extensive problems with the cannibalization of precision that occurs because FP numbers are, after all, only approximations ... with sufficient numbers of iterations loss of precision can devastate the accuracy of a model. I worked with a mathematician and in less than 10,000 iterations we had lost precision on SP to less than 3 digits. Careful algorithm work and fancy footwork and they have it under control ... SaH has historically some issues with tasks from different platforms though most don't recall some of those times, but, to get from here to there mostly what they did was reduce the strictness of the validation to something that was a lot more fuzzy ... Then we have the FDiv class errors where large swaths of the computer world has calculators that return consistent results that are just not correct. Cross platform, well, suffice to recall the Cray ... faster than snot but the FP system was sloppy and inaccurate ... but it was fast and good enough to get work done... but getting values that compare from 68000 and G5 and G4 and Intel 8086 class to all agree ... ugh ... Libraries, compilers, AMD vs Intel, ATI vs Nvidia, CPU vs CUDA (or ATI) ... again I can go on and on about all the places where we have differences ... differences generate problems ... Now, the glib answer is that error is the responsibility of the project. Though no one has really tried to make that case and prove that it is only the project's responsibility ... because you cannot ... it is the project's responsibility to manage the error rates and to know them so that the research can proceed on a sound basis ... but it is properly BOINC's responsibility to do the work because if something is being done of the same nature by two projects then it is the proper responsibility of BOINC to provide the tools and technology to manage that aspect. Just as we don't require each project to invent their own database why would we require them to build the same infrastructure to manage systemic error detection and management? The whole point of BOINC is to be the middleware to handle common issues ... as an aside it fails (IN MY OPINION) a lot more than it should for any number of reasons but shows up mostly in the fact that if you watch multiple project you will see that the project teams spend large amounts of time solving the same problems that have occurred on other projects ... Case in point ... those posts recently about I am having problems getting my project started ... were and are pretty much the same questions that the last group had 6 months ago had ... and the group before that ... and the group before that ... Again, a very incomplete answer but here goes (again) ... :) SaH is my favorite example in part because it fits in with my technician background ... but in essence it is a radio, and the end result we are looking for a radio station playing our song. But, we don't know what our song even looks like ... so, what is wrong with doing multiple things with that preverbal one stone? If I start with a known artificial signal, in the simplest form, a single pulse. And I inject it into one node in our network I will process it and get back and I can see if the system did detect the one pulse as expected. IF so, as Martin correctly stated we have one end to end test ... now, do that over a bunch of nodes, at this point we can start to answer several questions ... are they all detecting the one pulse and if so what are the limits of error on that return. There is no noise so the answers should be identical... but they won't be ... different OS, different processors, different compilers, on and on ... but now we can start to determine where are the sources of these errors ... and oh by the way isolate the machines that are returning just bad answers ... the side effect is that we have also done the benchmark. Again, this is woefully incomplete, and the nay sayers have seized on enough trivia and what ever ... computers are good and wonderful tools but are never to be trusted ... sadly the mindset in BOINC-World is they are never to be doubted. There is an engineering axiom/rule that talks about known knowns and the like... the MSM made fun of Rumsfeld for that ... but the point of that whole thing is that in the end the only thing that gets you in trouble are the unknown unknowns ... and this is an attempt to start to probe that areas to see if there are problems out there ... saying that there aren't doesn't mean there aren't ...and proving the negative case is impossible ... so ... that is the point ... Again, this is an incomplete gloss over of the conceptual things that are to be looked for in that dark area of unknowns ... >> is needed to establish the operational speed of that machine in >> CS ... note that the point is to establish this with more >> reliability with real work on real machines over real execution >> times so that the instability of the benchmarks is eliminated as >> an issue. > Yes, running many of "almost" real tasks will improve credit > estimation. > But : > 1) at too high overhead to be valuable. > 2) the same could be achieved by usage of REAL tasks and set of > reference PCs. Which is why I suggest a suite. The point being that this is not just about one thing. We are doing this change to achieve multiple goals. ONE of which is improving the benchmark accuracy so it is meaningful which then allows us to establish a stable tie between our benchmark results and credit awards and eventually eliminate the disparities of credit awards across projects. Why should I get 50 CS on project A for an hours effort when project B grants 75 for the same time. Why should I get 15 when my claim was 35 ... I would use some generated samples because the point is to prove that the end-to-end system is doing exactly what it should ... some real world samples because, well, they are real world ... and they are always qualitatively different than artificially generated signals ... >> My personal feeling is that using other mechanisms we can fill in >> the gap and the current benchmark can be eliminated ... > Yes, your type of benchmark can replace current benchmark, but will > have no valuable benefits and will have just bigger overhead. Which is because there has been so much negativity about one aspect or another without considering the whole. If you want, and some have, you can harp on the fact that the "benchmark" task will take as long as a real task and that this is "waste" if you insist on ignoring the other purposes served. Sadly this is exactly what has happened. Contrary to the implications I want to spend as little of the collective resource in doing this activity as the next person. >> The point is that instead of requiring the counting of FLOPS or >> Loops or anything else we establish a generalize earnings for a >> specific computer using a collective of work. The more different >> work loads we use the more "accurate" our estimate. > Again, right in base part, indeed, if many particular estimation for > each project will be replaced by single and more good established > estimation cross-project credit parity will be improved. > But again, reference PCs look more preferable for same task than > reference work. > They will achieve the same results but with less overhead involved. > > >> We make the assumption that the validator will catch errors... yet >> we know that the validator is a program written by people. The >> point is that if I make a SaH signal and the program returns 15 >> signals there is a problem somewhere ... yet if that bad answer is >> paired up with another back answer that is the same the validators >> will accept both answers. And more and more projects are going to >> adaptive replication and validating on one task... so the idea >> that the redundant computing is going to catch errors is slowly >> being eroded ... > 1) I'm against adaptive replication and consider adaptive > replication experiment on SETI beta totally messed up. > Adaptive replication lowers our faith in result validity indeed. It > suffers just from same flaw as your calibration tasks idea - it > can't account for random error or more or less fast change of > conditions. > CPU/GPU fan in absolutely trusted PC can eat too much dust... and we > will have the same host returning invalid results. How soon we could > catch this with calibration tasks or with adaptive replication? Not > very soon. And the faster host is, the more invalid work it will > produce before catch. > If and ONLY if project can accept lowering of result correctness > value in exchange of increasing (please, note, this is trade-off - > we pay precision for speed) performance, adaptive replication could > be used. > And surely if project goes to such measure it needs all power it > could get and will not waste its fraction to calibration tasks. > > 2) I get your point. Please, try to get my. > If I will make such signal and discover 15 signals instead of one in > result file, I will immediately post this or E-mail to Eric. Then he > could reproduce this issue and take measures, including bug-fixes in > validator. No need to involve all participants PCs in that. I don't > argue tests and calibrations unneeded, they needed of course! But > they needed just ON ANOTHER LEVEL of hierarhy. I have it ... and if all participants were trustworthy then there is no need to check up on them ... though even Reagan said "Trust, but verify" Besides, a million people calling Eric? >> THIS IS NOT THE COMPLETE PROPOSAL ... there are a myriad of >> details ... but I know that if I make it longer no one will read >> it ... but this is the core ... > Well, details usually go after basic idea approval, no probs with > that. No the arguments are, historically, about trivial details or hyperbole ... _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
