then comes L1/L2/L3 caches occupancy at moment of doing "ref unit" and so on. Performance depends from workload, code opt level and even from room temperature (in case of notebook with SpeedStep activated ) and no benchmarks can change this.
----- Original Message ----- From: Nicolás Alvarez To: [email protected] Sent: Tuesday, September 22, 2009 12:11 AM Subject: Re: [boinc_dev] [boinc_alpha] Card Gflops in BOINC 6.10 El Lunes 21 Sep 2009 15:04:08 Lynn W. Taylor escribió: > It's hard to find a proper table (they used to be published), but this > may help illustrate the problem: > > <http://www.obliquity.com/computer/speedtest.html> > > If the example program on the page is accurate (and I suspect it is) a > floating point add is roughly three times faster than a floating point > divide, and more than ten times faster than a floating point cosine. > > Yet, we count all floating point instructions as one flop. > > A different processor might be slow to add, or do cosine relatively fast. And what about pipelines? Adding 10 numbers and *then* calculating 10 cosines may be slower than interleaving the adds and cosines, because in the latter case there is a higher chance the CPU can do both operations at the same time. -- Nicolas _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
