On Oct 1, 2009, at 9:17 AM, Martin wrote:
> Paul D. Buck wrote:
>>
>> On Sep 30, 2009, at 6:03 AM, Martin wrote:
>>
>>> OK, this is where Paul's apparent wishes (note, should be
>>> expressed as
>>> "ideas",) and my ideas diverge.
>>>
>>>
>>> Paul is proposing that special "calibration WUs" are to be passed
>>> through the Boinc system end-to-end for the dual purpose of
>>> calibrating
>>> the performance of the client that processed the WU, and to also
>>> act as
>>> a validation check of the entire Boinc WU data path.
>>>
>>>
>>> My proposal is that we just do the minimum necessary to calibrate
>>> the
>>> host performance and credits against a known project lab reference
>>> computer using the normal pool of live WUs. Then, the only 'wasted'
>>> compute time is that needed to characterise the one (or few)
>>> reference
>>> computer systems in the lab. Everything else is then compared
>>> against
>>> them. The calibration is propagated hierarchically through the
>>> participants hosts in a similar way to what is done for such as
>>> NTP and
>>> for NIST standards. At least a small level of WU redundancy is
>>> required
>>> so that the calibration can propagate by comparing the hosts that
>>> have
>>> processed the same WU. The coordination for this can be done totally
>>> server-side (as part of the validator?).
>>>
>>>
>>> Sorry Paul, the end-to-end 'validation' is something that is
>>> likely so
>>> specific to each project that it is up to the project to test/
>>> prove the
>>> correctness of the Boinc generated results.
>>
>> The point being is that if the client is not returning good results
>> for
>> SaH, it is not that likely that it is returning good results to
>> Rosetta.
>
> That depends on what the 'errors' might be and how. Agreed, that is
> likely to be the case, but not always. Hence the projects must have
> robust tests and validation for themselves for /their/ processing and
> analysis.
The point is, again, that the infrastructure to do this is common to
more than one project and thus is properly the provence of BOINC.
> There is also the problem of how you can guarantee that a combined
> fault-test-and-calibration WU will actually exercise all faults
> paths to
> be a thorough test. Otherwise, by your own argument, what can we call
> "thorough enough"?... Anything less than 100% fault coverage is
> useless?
I cannot. And I never suggested that this will catch all extant
errors. You are just setting a standard that no one can reach to
reject the idea that we can reach higher levels of confidence. The
perfect being the enemy of the good enough. The point is that the
assertion is made that x redundancy catches all errors without any
proof. All we know is that it caught y errors. We have no idea of
how many other potential errors slipped by. John is absolutely
correct and I concede the point that SOME projects can validate back
to the problem, never said that some project could not ... SaH cannot,
neither can Einstein or CPDN ... Rosetta? Well with a lab you can ...
> I'm also working from the premise that it is unrealistic to try to
> abstractly characterise or to instrument accurately enough live WUs to
> be useful. Hence, don't waste developer time trying that. Instead, use
> the real world representative example that /is/ a computer system that
> can then be readily benchmarked in the lab in any way that might be
> chosen.
If this method worked we would never need beta tests and software
products released after Beta would be flawless ... neither is the case.
> Aside: A numerical test WU could be useful as a test case during
> development to highlight the differences that can be expected across
> different host platforms. A sort of numerical recipes test example.
> Note: I don't think that would be of any use to use as a 'special'
> calibration WU. That's not needed unless you want to claim some
> benchmark for the reference _computer_
>
>
> [...]
>> The difficulty of the extension from your suggestion to mine is so
>> minimal that this is why when I started with your thought as
>> expressed
>> above I made the natural extension.
>
> However, that leads to a very different point of reference. I don't
> think that we can realistically contrive pan-dimensional project
> fault-test-and-calibration WUs that would be any more accurate than
> the
> existing whetstone-dhrystone benchmarking. There will always be
> another
> project that uses the host hardware /differently/ ...
Again you are making perfect the enemy of good enough.
> Hence, use the hardware itself as your ("etalon" ;-) "golden")
> reference
> standard.
>
>
> [...]
>> no science value what-so-ever ... I'm sorry ... but I love irony ...
>
> Good to see that you have some sort of humour :-)
Sadly it is the only one I have ... took my spouse 15 years before she
says she could reliably tell when I was telling a joke.
> Occam's Razor is a good idea. Try to avoid feature overload! Keep it
> as
> simple as possible. Simple calibration should be enough. Other methods
> are available and are better for validation and test of whatever WU
> results are wanted.
I did. Again you miss the point I was after more than just benchmarking.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.