Serge Belyshev wrote:

"Vladimir N. Makarov" <[EMAIL PROTECTED]> writes:

I run SPEC2000 several times per week and always look at 3 runs (to be
sure that is nothing wrong happened) but I never saw such big
"confidence" intervals (as I understand that is difference between max
and min of 3 runs divided by the score). [...]

No, it is much more complex than that, I've used generally accepted
definition of a confidence interval, see 
http://en.wikipedia.org/wiki/Confidence_interval
which basically tells that with 95% probabilty (the confidence level I've 
choosed)
true value lies in this interval.

I've used conservative estimate of confidence intervals in this case
because I didn't assume gaussian distribution of numbers which I
reported as difference between two run times, and this estimate is somewhat
bigger than difference between max and min of 3 runs :)

Well, you should have written all this the first time that you choosen 95% probability (although it is most widely used probability for the confidence intervals) and did not use the normal distribution to permit people better interpret all this numbers. Not all people know statistics or studied it long ago.

Still the numbers is a bit useless at least for me. What I (and i guess most people) wanted just the overall score (with the confidence interval if you want although it would take more time).

On the other hand, it is good that you calculated and wrote the confidence intervals and I asked about them. Now I understand that the machine (or may be all AMD machine according to Jan) can not be used to check gcc performance progress. According to Proebsting's law (it is a pseudo law which is analog of Moor's law for compilers) compiler generates 2 times better code each 18 years or less than 4% in average every year. Actually our progress from one release to another is sometimes less. So I need a better tool to measure progress on small interval of time (a few months). Fortunately I have it (a Core2 machine, itanium and my ppc machine are also accurate but not as Core2).

I know a lot of people and I agree with them that there are different benchmarks and benchmarking is evil. But it is better have some rules than nothing. I like Spec because it is most acknowledged in compiler world and it is not easy to cheat it (as choosing one random bechmark like wethstone and report an improvement)

[...] If the machine has only 512 Mb memory (even they
write that it is enough for SPEC2000), the scores for some benchmark
programs may be unstable.  [...]

My box is equipped with 2Gigs of RAM so I believe this is not the case,
Also the computer was *absolutely* idle when it was running spec2k.
(booted with init=/bin/sh and no other processes were running).
It might be motherboard, chipset and a lot of other parameters that makes the machine is not good for gcc performance progress.

Reply via email to