Hi Matt,
[timings]
Intel AVX2:
C_SW 1.4931
D_SW 5.4254
PG_D 1.0878
TRACER_2D 24.7418
REMAPPING 27.2644
Now I looked at GNU Fortran (7.3.0). Here my "stock" flags are quite
boring (and all flags, not just the optimization ones):
[Various options elided, the best was]:
GNU Haswell NoFMA Repack:
C_SW 2.4350
D_SW 9.7109
PG_D 0.7869
TRACER_2D 163.6474
REMAPPING 100.6820
So, my questions to you gurus are: Is there something I could try adding
to my gfortran options that might help with this discrepancy between
Intel AVX2 and GCC? Or perhaps I need to *remove* something (some flag
kills the vectorizer)?
The gcc 8 release is just around the corner, and a lot of improvements
have been made to code generation, also for AVX2. You might want to give
the current trunk (or the soon-to-be-released) release candidate, or the
then newly released gcc8 a spin.
Second, this performance gap with respect to Intel (a factor of 6.6 for
your TRACER_2D routine) is dramatic. If anything like this persists in
gcc8, the only way to get this fixed is to submit a bug report.
Profile the code, try to reduce the code to something that shows
the the problem (and that you can put in a bug report).
Regards
Thomas