Hi Matt,
[timings]

Intel AVX2:

    C_SW       1.4931
    D_SW       5.4254
    PG_D       1.0878
    TRACER_2D 24.7418
    REMAPPING 27.2644

Now I looked at GNU Fortran (7.3.0). Here my "stock" flags are quite boring (and all flags, not just the optimization ones):

[Various options elided, the best was]:

GNU Haswell NoFMA Repack:
    C_SW        2.4350
    D_SW        9.7109
    PG_D        0.7869
    TRACER_2D 163.6474
    REMAPPING 100.6820

So, my questions to you gurus are: Is there something I could try adding to my gfortran options that might help with this discrepancy between Intel AVX2 and GCC? Or perhaps I need to *remove* something (some flag kills the vectorizer)?
The gcc 8 release is just around the corner, and a lot of improvements
have been made to code generation, also for AVX2. You might want to give
the current trunk (or the soon-to-be-released) release candidate, or the
then newly released gcc8 a spin.

Second, this performance gap with respect to Intel (a factor of 6.6 for
your TRACER_2D routine) is dramatic. If anything like this persists in gcc8, the only way to get this fixed is to submit a bug report.
Profile the code, try to reduce the code to something that shows
the the problem (and that you can put in a bug report).

Regards

        Thomas

Reply via email to