Serge Belyshev wrote:
"Vladimir N. Makarov" <[EMAIL PROTECTED]> writes:
I run SPEC2000 several times per week and always look at 3 runs (to be
sure that is nothing wrong happened) but I never saw such big
"confidence" intervals (as I understand that is difference between max
and min of 3 runs divided by the score). [...]
No, it is much more complex than that, I've used generally accepted
definition of a confidence interval, see
http://en.wikipedia.org/wiki/Confidence_interval
which basically tells that with 95% probabilty (the confidence level I've
choosed)
true value lies in this interval.
I've used conservative estimate of confidence intervals in this case
because I didn't assume gaussian distribution of numbers which I
reported as difference between two run times, and this estimate is somewhat
bigger than difference between max and min of 3 runs :)
Well, you should have written all this the first time that you choosen
95% probability (although it is most widely used probability for the
confidence intervals) and did not use the normal distribution to permit
people better interpret all this numbers. Not all people know
statistics or studied it long ago.
Still the numbers is a bit useless at least for me. What I (and i guess
most people) wanted just the overall score (with the confidence interval
if you want although it would take more time).
On the other hand, it is good that you calculated and wrote the
confidence intervals and I asked about them. Now I understand that the
machine (or may be all AMD machine according to Jan) can not be used to
check gcc performance progress. According to Proebsting's law (it is a
pseudo law which is analog of Moor's law for compilers) compiler
generates 2 times better code each 18 years or less than 4% in average
every year. Actually our progress from one release to another is
sometimes less. So I need a better tool to measure progress on small
interval of time (a few months). Fortunately I have it (a Core2
machine, itanium and my ppc machine are also accurate but not as Core2).
I know a lot of people and I agree with them that there are different
benchmarks and benchmarking is evil. But it is better have some rules
than nothing. I like Spec because it is most acknowledged in compiler
world and it is not easy to cheat it (as choosing one random bechmark
like wethstone and report an improvement)
[...] If the machine has only 512 Mb memory (even they
write that it is enough for SPEC2000), the scores for some benchmark
programs may be unstable. [...]
My box is equipped with 2Gigs of RAM so I believe this is not the case,
Also the computer was *absolutely* idle when it was running spec2k.
(booted with init=/bin/sh and no other processes were running).
It might be motherboard, chipset and a lot of other parameters that
makes the machine is not good for gcc performance progress.