Some good comments there.

The first question is whether we calibrate to a real *one* piece of 
hardware, or whether we calibrate to the performance of mythical 
hardware that is somehow "averaged in some way" from the performance of 
various bits of hardware.

The "mythical" hardware has the advantage of no hardware being needed 
in-lab. We just use an average of a "trusted" subset of participants' hosts.

Using a singular piece of hardware in-lab has the advantage of being 
completely trusted and in a controlled environment. We can also run 
various benchmarking tests as needed/wished.

If using participants' hosts to form a mythical "etalon computer" as the 
reference, are we in danger of being victim to unexpected side effects 
due to whatever is the presently dominant operating system and/or 
drivers? And also victim to when the dominant mix of systems change? And 
also victim to side effects of how the averaging is done?

I could well imagine the averaging producing some nonsense 
impossibilities when you mix in CPUs, CPUs + HT & virtualisation, GPUs, 
PowerPCs, and any other architectures. Do we really want an 
indeterminate mongrel as a reference standard?

I think we must accept for example that GPUs and array processors work 
very differently to home PC CPUs, and that they have very different 
processing characteristics.


With the in-lab trusted and controlled environment references, we can 
exactly document and measure any changes made to that reference...

For an in-lab reference, that does suggest that some "Boinc-HQ" must 
provide a central reference service to then automatically maintain 
cross-project credits parity.

Would that also make life easier when the more comprehensive credits 
measures are introduced to automatically count the other resources used 
other than just s...@h-flops?


For myself, I much prefer the idea of scientific rigorous direct 
measurement certainty over that of a complex statistical guess...

Regards,
Martin


ps: No thesaurus needed for that last sentence :-) Apologies if it 
breaks the Google/Babel translators! :-(



Lynn W. Taylor wrote:
> Martin wrote:
> 
>> Hence, reference against /present day/ hardware to allow for the new 
>> performance enhancements in the newer hardware?... The present day 
>> reference can be still calibrated to stay in line with whatever older 
>> hardware was used for the reference system as newer hardware is 
>> brought into use.
>>
>> Note that we can stay with the Cobblestones benchmark as is. However, 
>> we can also benchmark the (in lab) reference computer with any other 
>> benchmarks of interest and by virtue of the propagated calibration 
>> across all hosts, we will be able to say something meaningful about 
>> how that benchmark relates to Boinc as a whole.
> 
> Just thinking out loud here.
> 
> Whetstones and Dhrystones share a problem with every other synthetic 
> benchmark: they're synthetic.
> 
> So, in a sense, we've got a 1980's era benchmark, but the true "index" 
> is early 2000's hardware and how it completes the "old" benchmark.
> 
> Which isn't the same as indexing to late-1980's hardware -- we're 
> indexing to early-BOINC-era hardware, as measured on an old "Etalon."
> 
> No problem.
> 
> Calculating the benchmark * time credit is right straight from the 
> definition.  I'm not sure we need to index it at all.  It may be a 
> little odd, but it's odd by definition.
> 
> What if we had a fleet of designated machines that make up the standard? 
>  They'd be purchased to be representative of the current fleet: some 
> fast, some slow, the only criteria is that they all be "measurable" 
> machines -- no GPUs.  Some AMD, some Intel.  Atoms to i7's.
> 
> Calculate the "benchmark * time" credit, compare that to the average 
> number of FLOPs and you've got the conversion factor based on your 
> reference fleet.
> 
> Okay, now we've got our dozen reference machines.  Let's find a few 
> dozen machines "out there" that behave identically.  More because there 
> needs to be a way to detect changes.
> 
> Now we have a reference fleet without having to own it.
> 
> We've got an average cobblestone credit for that group calculated using 
> the definition benchmark * time.  Compare that to the average number of 
> FLOPs and we've now got the same value based on our "virtual" fleet.
> 
> Which is nearly the same as what Eric's script does.  Probably within a 
> percent or two.


-- 
--------------------
Martin Lomas
m_boincdev ml1 co uk.ddSPAM.dd
--------------------
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to