Sorry for a double posting. (Netscape mailer sucks!) Alos, my previous post got rejected due to a maximum post size. So I have to exclude the rpm attachment. This includes the comparison results only. Alan Watson wrote: > > I've had two disconcerting experiences lately. Last month, > we upgraded a Sun from Solaris 2.6 to Solaris 2.7; it got > faster. Last week, I upgraded my dual 450 MHz PII machine > from RedHat 5.2 (2.0.36) to RedHat 6.0 (2.2.5-15smp); it got > slower. > > I'm working on a two-stage image compression method. The > image is quantized in one process and fed into gzip or bzip2 > using popen(). In sh terms, this is "process1 | process2". > This used to parallelize quite nicely, with one process > running on each processor, but now about the best I can get > is 120% CPU usage, often less. Overall, the pipe is slower. > Please attached find the following: 1) rpm of nbench which contains nbench (several binaries compiled with different compilers and optimization flags are available for comparison). -- not included. 2) HTML documnet summaruzing the results 3) text version of the same document. The short version is: For egcs compiler, you need to compile with -mpentiumpro switch on Pentium II machines. Binaries compiled with -m386 or -m486 switch will run slower than binaries compiled with gcc! I have tried to post it on slashdot twice, but was not able to get through. Please check it and spread the word, I think this is important. And no, the issue is not kernel related. -- ---------------------------------------- Constantine Gavrilov Unix System Administrator and Programmer Orbotech Yavne 81102, Israel Phone: (972-8)-942-3064 Fax: (972-8)-942-3800 ----------------------------------------I did some work on nbench benchmark suite. As a result, I created a nbench-byte rpm package. It included nbench binaries compiled with different compilers and different optimization flags:
nbench (gcc version)
nbench.egcs (egcs 1.03 version -m486)
nbench.586 (egcs 1.03 version -mpentium)
nbench.ppro (egcs 1.03 version -mpentiumpro)
nbench.egcs.1.1.2 (egcs 1.1.2 version -m486)
nbench.586.1.1.2 (egcs 1.1.2 version -mpentium)
nbench.ppro.1.1.2 (egcs 1.1.2 version -mpentiumpro)
I benchmarked my home computer with these binaries. The results, showing the influence of different compilers, -mpentium and -mpentiumpro flag, are summarized below. They are rather interesting. I will post the binary and source RPMs to the net, so everybody will be able to verify the results.
Results ======================================================================================================================================================================: TEST : gcc 2.7.2.3 :: egcs 1.03 :: egcs 1.03 Pentium :: egcs 1.03 Ppro :: egcs 1.1.2 ::egcs 1.1.2 Pentium :: egcs 1.1.2 Ppro : :----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------: : Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index : --------------------:----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------: NUMERIC SORT : 159.16| 1.34 :: 157.07| 1.32 :: 158.04| 1.33 :: 165.44| 1.39 :: 168.76| 1.42 :: 165.71| 1.40 :: 168.52| 1.42 : STRING SORT : 16.161| 1.12 :: 17.78| 1.23 :: 17.551| 1.21 :: 18.012| 1.25 :: 16.786| 1.16 :: 17.516| 1.21 :: 17.372| 1.20 : BITFIELD :5.2937e+07| 1.90 ::4.9756e+07| 1.78 ::4.9627e+07| 1.78 ::4.9685e+07| 1.78 ::4.8988e+07| 1.76 ::4.8866e+07| 1.75 ::4.9015e+07| 1.76 : FP EMULATION : 9.8535| 1.09 :: 12.04| 1.33 :: 12.089| 1.34 :: 16.627| 1.84 :: 10.689| 1.18 :: 10.638| 1.18 :: 15.8| 1.75 : FOURIER : 3428.8| 2.19 :: 4170.7| 2.66 :: 4172.3| 2.67 :: 4154.1| 2.65 :: 4013.3| 2.56 :: 4127.6| 2.64 :: 4121.1| 2.63 : ASSIGNMENT : 2.3051| 2.28 :: 1.907| 1.88 :: 1.9534| 1.93 :: 2.186| 2.16 :: 1.9593| 1.93 :: 1.9881| 1.96 :: 2.1577| 2.13 : IDEA : 356.72| 1.62 :: 336.1| 1.53 :: 342.78| 1.56 :: 410.64| 1.86 :: 331.51| 1.51 :: 333.2| 1.51 :: 404.47| 1.84 : HUFFMAN : 169.81| 1.50 :: 148.42| 1.31 :: 150.04| 1.33 :: 163.16| 1.44 :: 149.2| 1.32 :: 154.62| 1.37 :: 167.16| 1.48 : NEURAL NET : 2.6266| 1.77 :: 3.4364| 2.32 :: 3.3386| 2.26 :: 3.483| 2.35 :: 4.5617| 3.08 :: 4.5581| 3.08 :: 4.8244| 3.26 : LU DECOMPOSITION : 125.12| 4.68 :: 137.48| 5.14 :: 135.12| 5.05 :: 153.84| 5.75 :: 145.16| 5.43 :: 146.08| 5.46 :: 151.32| 5.66 : ====================:===================::===================::===================::===================::===================::===================::===================: MEMORY INDEX : 1.690 :: 1.604 :: 1.608 :: 1.685 :: 1.579 :: 1.609 :: 1.650 : ====================:===================::===================::===================::===================::===================::===================::===================: INTEGER INDEX : 1.374 :: 1.371 :: 1.385 :: 1.621 :: 1.352 :: 1.358 :: 1.612 : ====================:===================::===================::===================::===================::===================::===================::===================: FLOATING INDEX : 2.630 :: 3.169 :: 3.121 :: 3.300 :: 3.501 :: 3.540 :: 3.649 : ====================:===================::===================::===================::===================::===================::===================::===================:
* The benchmarked CPU is GenuineIntel Pentium II (Deschutes) 375 MHz 512 KB cache.
* SuperMicro P6DGU motherboard, 256 MB RAM, 75 MHz bus clock.
* gcc and egcs compilation flags:
-s -static -O3 -fomit-frame-pointer -Wall -m486 -fforce-addr \
-fforce-mem -malign-loops=2 -malign-functions=2 -malign-jumps=2\
-funroll-loops
* egcs pentuim compilation flags:
-s -static -O3 -fomit-frame-pointer -Wall \
-fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
-malign-jumps=2 -funroll-loops -mpentium
* egcs Ppro optimized compilation flags:
-s -static -O3 -fomit-frame-pointer -Wall \
-fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
-malign-jumps=2 -funroll-loops -mpentiumpro
* the same standard egcs binaries were used to generate 486, pentium and Ppro optimized
binaries, only compilation flags were different.
Observations:
- Latest versions of egcs compiler provide significant speed-ups for Pentium and Pentium Pro processors, especially for floating point intensive code.
- Binaries generated by egcs compiler optimized for CPU of type "A" and run on a cpu of type "B" can show significant degradation of performance in memory and integer intensive operations. This is true specifically for the currently most common CPU - Pentium II (Pentium Pro architecture). For the latest versions of egcs compiler this effect seems to be even stronger.
- Binaries generated by egcs compiler and not optimized for the Pentium Pro CPU will not show optimal performance in floating point intensive applications when run on Pentium II / Ppro machines. Significant speed-ups for such applications can be achieved in the case the binaries were optimized for the Pentium Pro CPU. This effect seems to be stronger for the latest versions of egcs compiler.
- The latest versions of egcs compiler seem to generate binaries which run slower in the case of memory and integer intensive operations. This at least true for the Pentium II / Pentium Pro CPUs. This maybe corrected by different optimization flags. A word of advise, anyone?
Conclusions:
Most distribution vendors (including RedHat) have switched to egcs compiler. However, they provide binaries generated for 486 CPUs. Ordinarily, these binaries will show degradation of performance when run on the most common today Pentium II CPUs. In the best case, these binaries will not show optimal performance. Thus, distribution vendors must be pressed to compile packages optimized for Pentium Pro CPUs by default. Ever wondered why Linux seems not as fast as it could have been on the modern machines? The secret is hidden within the binaries compiled with -m486 flag!
TEST : gcc 2.7.2.3 :: egcs 1.03 :: egcs 1.03 Pentium ::
egcs 1.03 Ppro :: egcs 1.1.2 ::egcs 1.1.2 Pentium :: egcs 1.1.2 Ppro :
:----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------:
: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index ::
Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :
--------------------:----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------:
NUMERIC SORT : 159.16| 1.34 :: 157.07| 1.32 :: 158.04| 1.33 ::
165.44| 1.39 :: 168.76| 1.42 :: 165.71| 1.40 :: 168.52| 1.42 :
STRING SORT : 16.161| 1.12 :: 17.78| 1.23 :: 17.551| 1.21 ::
18.012| 1.25 :: 16.786| 1.16 :: 17.516| 1.21 :: 17.372| 1.20 :
BITFIELD :5.2937e+07| 1.90 ::4.9756e+07| 1.78 ::4.9627e+07| 1.78
::4.9685e+07| 1.78 ::4.8988e+07| 1.76 ::4.8866e+07| 1.75 ::4.9015e+07| 1.76 :
FP EMULATION : 9.8535| 1.09 :: 12.04| 1.33 :: 12.089| 1.34 ::
16.627| 1.84 :: 10.689| 1.18 :: 10.638| 1.18 :: 15.8| 1.75 :
FOURIER : 3428.8| 2.19 :: 4170.7| 2.66 :: 4172.3| 2.67 ::
4154.1| 2.65 :: 4013.3| 2.56 :: 4127.6| 2.64 :: 4121.1| 2.63 :
ASSIGNMENT : 2.3051| 2.28 :: 1.907| 1.88 :: 1.9534| 1.93 ::
2.186| 2.16 :: 1.9593| 1.93 :: 1.9881| 1.96 :: 2.1577| 2.13 :
IDEA : 356.72| 1.62 :: 336.1| 1.53 :: 342.78| 1.56 ::
410.64| 1.86 :: 331.51| 1.51 :: 333.2| 1.51 :: 404.47| 1.84 :
HUFFMAN : 169.81| 1.50 :: 148.42| 1.31 :: 150.04| 1.33 ::
163.16| 1.44 :: 149.2| 1.32 :: 154.62| 1.37 :: 167.16| 1.48 :
NEURAL NET : 2.6266| 1.77 :: 3.4364| 2.32 :: 3.3386| 2.26 ::
3.483| 2.35 :: 4.5617| 3.08 :: 4.5581| 3.08 :: 4.8244| 3.26 :
LU DECOMPOSITION : 125.12| 4.68 :: 137.48| 5.14 :: 135.12| 5.05 ::
153.84| 5.75 :: 145.16| 5.43 :: 146.08| 5.46 :: 151.32| 5.66 :
====================:===================::===================::===================::===================::===================::===================::===================:
MEMORY INDEX : 1.690 :: 1.604 :: 1.608 ::
1.685 :: 1.579 :: 1.609 :: 1.650 :
====================:===================::===================::===================::===================::===================::===================::===================:
INTEGER INDEX : 1.374 :: 1.371 :: 1.385 ::
1.621 :: 1.352 :: 1.358 :: 1.612 :
====================:===================::===================::===================::===================::===================::===================::===================:
FLOATING INDEX : 2.630 :: 3.169 :: 3.121 ::
3.300 :: 3.501 :: 3.540 :: 3.649 :
====================:===================::===================::===================::===================::===================::===================::===================:
* The benchmarked CPU is GenuineIntel Pentium II (Deschutes) 375 MHz 512 KB cache.
* SuperMicro P6DGU motherboard, 256 MB RAM, 75 MHz bus clock.
* gcc and egcs compilation flags:
-s -static -O3 -fomit-frame-pointer -Wall -m486 -fforce-addr \
-fforce-mem -malign-loops=2 -malign-functions=2 -malign-jumps=2\
-funroll-loops
* egcs pentuim compilation flags:
-s -static -O3 -fomit-frame-pointer -Wall \
-fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
-malign-jumps=2 -funroll-loops -mpentium
* egcs Ppro optimized compilation flags:
-s -static -O3 -fomit-frame-pointer -Wall \
-fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
-malign-jumps=2 -funroll-loops -mpentiumpro
* the same standard egcs binaries were used to generate 486, pentium and Ppro
optimized binaries, only compilation flags were different.
Observations:
1) Latest versions of egcs compiler provide significant speed-ups for Pentium and
Pentium Pro processors, especially for floating
point intensive code.
2) Binaries generated by egcs compiler optimized for CPU of type "A" and run on a cpu
of type "B" can show significant degradation
of performance in memory and integer intensive operations. This is true specifically
for the currently most common CPU - Pentium II
(Pentium Pro architecture). For the latest versions of egcs compiler this effect seems
to be even stronger.
3) Binaries generated by egcs compiler and not optimized for the Pentium Pro CPU will
not show optimal performance in floating point
intensive applications when run on Pentium II / Ppro machines. Significant speed-ups
for such applications can be achieved in the
case the binaries were optimized for the Pentium Pro CPU. This effect seems to be
stronger for the latest versions of egcs compiler.
4) The latest versions of egcs compiler seem to generate binaries which run slower in
the case of memory and integer intensive
operations. This at least true for the Pentium II / Pentium Pro CPUs. This maybe
corrected by different optimization flags. A word
of advise, anyone?
Conclusions:
Most distribution vendors (including RedHat) have switched to egcs compiler. However,
they provide binaries generated for 486 CPUs.
Ordinarily, these binaries will show degradation of performance when run on the most
common today Pentium II CPUs. In the best case,
these binaries will not show optimal performance. Thus, distribution vendors must be
pressed to compile packages optimized for
Pentium Pro CPUs by default. Ever wondered why Linux seems not as fast as it could
have been on the modern machines? The secret is
hidden within the binaries compiled with -m486 flag!
