------- Comment #6 from burnus at gcc dot gnu dot org 2007-03-12 08:16 ------- > Can someone try instead of doing "__real__ a += w[j] *__real__ mfi[*index];" > Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication?
Well, -std=c99 was used already and the "real(!) * complex" calculation was already correct. "c_cmplx" below uses now: a += w[j ] * mfi[*index++]; Compiled with: gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math -m64 gfortran -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math -m64 Fortran: 0.4360271 Fortran: 0.4280267 c_nosse: 0.2440166 c_nosse: 0.2320151 c_sse: 0.2320137 c_sse: 0.2400150 c_struct: 0.2320151 c_struct: 0.2320147 c_cmplx: 0.2360163 c_cmplx: 0.2320147 And using a non-manually unrolled version: 0.3760242, 0.3760242 for(i = 0; i < np ; i++) { for(j = 1; j < n; j++) a += w[j ] * mfi[*index++]; fo[i] = a; } Thus the unrolling seems to do most of the speed up. With -funroll-all-loops, the timings of fortran an the non-unrolled version remain the same. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139