Unfortunately, at the end of the day, replacing the benchmark with a 
reference work unit is just replacing one arbitrary benchmark with a 
different arbitrary benchmark.

The problem with the existing benchmark is that the benchmark code 
doesn't represent the instruction mix for the project.

When you say "We'll use a specific s...@home work unit as a 'reference' 
work unit" you have the same problem: the instruction mix does not match 
any other project.

The instruction mix likely doesn't even match from Multibeam to Astropulse.

In fact, one criticism of the benchmark is that it fits in the cache on 
virtually every modern processor.  Multibeam fits in the cache on 
high-end processors and does not fit on the low end.

.... meaning processor architecture still matters a lot more that we'd like.

Can the credit system be improved?  Yes.  Is working out the credit 
multiplier difficult?  Yes.  Is it possible to devise a credit system 
with perfect cross-project parity?  No.

-- Lynn

Paul D. Buck wrote:
> Though it looks like the conversation died down again ... I think  
> there are a couple points yet to be made.
> 
> If I had one and only one objection to be made it is that this system  
> seems to be based upon the benchmarking system without any attempts  
> being made to correct for those deficiencies (as best I can tell). To  
> my mind the worst feature of the benchmarks was not that they were  
> inaccurate, but they cannot be replicated.  Repeated runs even on  
> systems that are quiescent can get reported results that cover as  
> spread with as much as a 20% variance.
> 
> The concept of a "reference job" I am happy to see as that was the  
> cornerstone of the proposal I made for use of calibration to quantify  
> and test our systems in the BOINC universe. See: 
> http://www.boinc-wiki.info/Improved_Benchmarking_System_Using_Calibration_Concepts
> 
> I still see SaH as one of the "best" sources in that the source is  
> public and probably the best understood.  Most importantly it should  
> be relatively easy to make known test tasks by hand that have known  
> characteristics that can be tested and to a great extent perhaps even  
> tested with instrumented code so that precise counts of FLOPS could be  
> made.
> 
> An assumption is made that the GPU versions will be more efficient.  I  
> think Aqua found that the converse is true (I do not know this for  
> sure, it was in a post I read the other day in discussing projects  
> with GPU applications that they dropped the GPU version because it was  
> worse than the CPU version - multi-threaded).
> 
> It may be that I am too dense to get it, but I also do not see how  
> this proposal would adequately address the quality metrics we might  
> extract from those projects where there are applications that span the  
> types and classes of computing resources.  For example, the two "best"  
> projects at this time are MilkyWay and Collatz in that they have  
> applications that span all three of the currently available computing  
> resources: CPU, Nvidia CUDA, and ATI Stream.
> 
> And finally, the issue of optimized applications vs. "stock"  
> application ... the hardware will report the same FLOPS but it seems  
> to me the faster execution time of the optimized application will  
> cause problems.
> 
> Opps, two more finallies, you would require a change to all science  
> applications to make this effective and you still require the projects  
> to make an initial estimate regardless of its accuracy (predicted  
> number of app units).
> 
> On Aug 28, 2009, at 12:45 PM, David Anderson wrote:
> 
>> I'm coming around to the viewpoint that projects shouldn't be expected
>> to supply estimates of job duration or application performance.
>> I think it's feasible to maintain these estimates dynamically,
>> based on actual job runtimes.
>> I've sketched a set of changes that would accomplish this:
>> http://boinc.berkeley.edu/trac/wiki/AutoFlops
>> Comments welcome.
>>
>> BTW, a bonus of the proposed design is that it provides
>> a project-independent credit-granting policy.
>>
>> -- David
>>
>> Richard Haselgrove wrote:
>>> ...  if projects
>>> are expected to fine-tune performance metrics down to the individual
>>> plan_class level, then I'm sorry, but they just won't. I've had to  
>>> shout
>>> (loudly and repeatedly) at both AQUA and GPUGrid to get them to  
>>> adjust
>>> rsc_fpops_est to within an order of magnitude of reality (in AQUA's  
>>> case,
>>> two orders of magnitude).
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
> 
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to