On Thu, Apr 19, 2018 at 8:33 AM, Thomas Koenig <tkoe...@netcologne.de> wrote: > Hi Matt, > [timings] > >> Intel AVX2: >> >> C_SW 1.4931 >> D_SW 5.4254 >> PG_D 1.0878 >> TRACER_2D 24.7418 >> REMAPPING 27.2644 > > >> Now I looked at GNU Fortran (7.3.0). Here my "stock" flags are quite >> boring (and all flags, not just the optimization ones): > > > [Various options elided, the best was]: > >> GNU Haswell NoFMA Repack: >> C_SW 2.4350 >> D_SW 9.7109 >> PG_D 0.7869 >> TRACER_2D 163.6474 >> REMAPPING 100.6820 >> >> So, my questions to you gurus are: Is there something I could try adding >> to my gfortran options that might help with this discrepancy between Intel >> AVX2 and GCC? Or perhaps I need to *remove* something (some flag kills the >> vectorizer)? > > The gcc 8 release is just around the corner, and a lot of improvements > have been made to code generation, also for AVX2. You might want to give > the current trunk (or the soon-to-be-released) release candidate, or the > then newly released gcc8 a spin. > > Second, this performance gap with respect to Intel (a factor of 6.6 for > your TRACER_2D routine) is dramatic. If anything like this persists in gcc8, > the only way to get this fixed is to submit a bug report. > Profile the code, try to reduce the code to something that shows > the the problem (and that you can put in a bug report).
Dependent on what those routines do (do they call math intrinsics like sin or cos?) ICC has an advantage with a highly optimized vectorized math library. You can use that from gfortran as well by using -mveclibabi=svml and linking against libsvml.{a,so} which comes with ICC. Unfortunately gfortran cannot exercise glibcs libmvec at the moment. Richard. > Regards > > Thomas