The following program is based on the gaz_dyn.f90 test of Polyhedron; there "gfortran -march=opteron -ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 -g" needs 0m13.999s whereas ifort 9.1 "-O3 -xW -ipo -no-prec-div -static -V -g" needs 0m7.638s. See http://www.polyhedron.com/pb05/linux/f90bench_AMD.html
The following cut-down and C program needs with "icc -O3 -xW -no-prec-div -static -V -g" 0m2.406s and with "gcc -march=opteron -ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 -g" 0m7.212s void eos(const int NODES, const float CGAMMA, float CS[], float PRES[], float DENS[]) { int j; for(j = 0; j < NODES; j ++) { CS[j] = sqrt(CGAMMA*PRES[j]/DENS[j]); } } int main() { const int NODES = 25000; float CGAMMA; float DENS[NODES], CS[NODES], PRES[NODES]; int i,j; for(i = 0; i < NODES; i++) { DENS[i] = 3.0; PRES[i] = 0.25; } CGAMMA = 2.0; for(i = 0; i < 20000; i++) { eos(NODES, CGAMMA, &CS, &PRES, &DENS); CGAMMA = CGAMMA + CS[1]; } return (int)CGAMMA; } -- Summary: sqrt(CGAMMA*PRES[j]/DENS[j]) much slower than compiting compiler Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30032