Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

Vladimir N. Makarov Sun, 25 Feb 2007 15:56:09 -0800

Serge Belyshev wrote:

"Vladimir N. Makarov" <[EMAIL PROTECTED]> writes:

I run SPEC2000 several times per week and always look at 3 runs (to be
sure that is nothing wrong happened) but I never saw such big
"confidence" intervals (as I understand that is difference between max
and min of 3 runs divided by the score). [...]


No, it is much more complex than that, I've used generally accepted
definition of a confidence interval, see 
http://en.wikipedia.org/wiki/Confidence_interval
which basically tells that with 95% probabilty (the confidence level I've 
choosed)
true value lies in this interval.

I've used conservative estimate of confidence intervals in this case
because I didn't assume gaussian distribution of numbers which I
reported as difference between two run times, and this estimate is somewhat
bigger than difference between max and min of 3 runs :)

Well, you should have written all this the first time that you choosen95% probability (although it is most widely used probability for theconfidence intervals) and did not use the normal distribution to permitpeople better interpret all this numbers. Not all people knowstatistics or studied it long ago.

Still the numbers is a bit useless at least for me. What I (and i guessmost people) wanted just the overall score (with the confidence intervalif you want although it would take more time).

On the other hand, it is good that you calculated and wrote theconfidence intervals and I asked about them. Now I understand that themachine (or may be all AMD machine according to Jan) can not be used tocheck gcc performance progress. According to Proebsting's law (it is apseudo law which is analog of Moor's law for compilers) compilergenerates 2 times better code each 18 years or less than 4% in averageevery year. Actually our progress from one release to another issometimes less. So I need a better tool to measure progress on smallinterval of time (a few months). Fortunately I have it (a Core2machine, itanium and my ppc machine are also accurate but not as Core2).

I know a lot of people and I agree with them that there are differentbenchmarks and benchmarking is evil. But it is better have some rulesthan nothing. I like Spec because it is most acknowledged in compilerworld and it is not easy to cheat it (as choosing one random bechmarklike wethstone and report an improvement)

[...] If the machine has only 512 Mb memory (even they
write that it is enough for SPEC2000), the scores for some benchmark
programs may be unstable.  [...]


My box is equipped with 2Gigs of RAM so I believe this is not the case,
Also the computer was *absolutely* idle when it was running spec2k.
(booted with init=/bin/sh and no other processes were running).

It might be motherboard, chipset and a lot of other parameters thatmakes the machine is not good for gcc performance progress.

Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

Reply via email to