On Thu, Apr 19, 2018 at 8:33 AM, Thomas Koenig <tkoe...@netcologne.de> wrote:
> Hi Matt,
> [timings]
>
>> Intel AVX2:
>>
>>     C_SW       1.4931
>>     D_SW       5.4254
>>     PG_D       1.0878
>>     TRACER_2D 24.7418
>>     REMAPPING 27.2644
>
>
>> Now I looked at GNU Fortran (7.3.0). Here my "stock" flags are quite
>> boring (and all flags, not just the optimization ones):
>
>
> [Various options elided, the best was]:
>
>> GNU Haswell NoFMA Repack:
>>     C_SW        2.4350
>>     D_SW        9.7109
>>     PG_D        0.7869
>>     TRACER_2D 163.6474
>>     REMAPPING 100.6820
>>
>> So, my questions to you gurus are: Is there something I could try adding
>> to my gfortran options that might help with this discrepancy between Intel
>> AVX2 and GCC? Or perhaps I need to *remove* something (some flag kills the
>> vectorizer)?
>
> The gcc 8 release is just around the corner, and a lot of improvements
> have been made to code generation, also for AVX2. You might want to give
> the current trunk (or the soon-to-be-released) release candidate, or the
> then newly released gcc8 a spin.
>
> Second, this performance gap with respect to Intel (a factor of 6.6 for
> your TRACER_2D routine) is dramatic. If anything like this persists in gcc8,
> the only way to get this fixed is to submit a bug report.
> Profile the code, try to reduce the code to something that shows
> the the problem (and that you can put in a bug report).

Dependent on what those routines do (do they call math intrinsics like
sin or cos?) ICC has an advantage with a highly optimized vectorized
math library.  You can use that from gfortran as well by using
-mveclibabi=svml and linking against libsvml.{a,so} which comes with ICC.
Unfortunately gfortran cannot exercise glibcs libmvec at the moment.

Richard.

> Regards
>
>         Thomas

Reply via email to