------- Comment #6 from burnus at gcc dot gnu dot org  2007-03-12 08:16 -------
> Can someone try instead of doing "__real__ a += w[j] *__real__ mfi[*index];"
> Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication?

Well, -std=c99 was used already and the "real(!) * complex" calculation was
already correct. "c_cmplx" below uses now:
      a += w[j    ] * mfi[*index++];

Compiled with:
gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3
-ffast-math -m64
gfortran -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math
-m64

 Fortran:   0.4360271
 Fortran:   0.4280267
 c_nosse:   0.2440166
 c_nosse:   0.2320151
 c_sse:     0.2320137
 c_sse:     0.2400150
 c_struct:  0.2320151
 c_struct:  0.2320147
 c_cmplx:   0.2360163
 c_cmplx:   0.2320147
And using a non-manually unrolled version: 0.3760242, 0.3760242
  for(i = 0; i < np ; i++) {
    for(j = 1; j < n; j++)
      a += w[j    ] * mfi[*index++];
    fo[i] = a;
  }
Thus the unrolling seems to do most of the speed up. With -funroll-all-loops,
the timings of fortran an the non-unrolled version remain the same.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139

Reply via email to