https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #50 from Andrew Roberts <andrewm.roberts at sky dot com> ---
with the matrix.c benchmark on Ryzen and looking at the other options when
using -march=znver1 and -mtune=znver1

mult took 225281 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128
mult took 185961 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=256
mult took 187577 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=512

-adding mno-avx2 has no effect on the above baseline.

adding in -mno-fma

mult took 223302 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128
-mno-fma
mult took 123773 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=256
-mno-fma
mult took 124690 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=512
-mno-fma

Is the patch in trunk yet? I was assuming it was from the other comments.

using -march=ivybridge but keeping the rest of the options:
mult took 215052 clocks -march=ivybridge -mtune=znver1
-mprefer-vector-width=128   -mno-fma
mult took 121661 clocks -march=ivybridge -mtune=znver1
-mprefer-vector-width=256 -mno-fma
mult took 131763 clocks -march=ivybridge -mtune=znver1
-mprefer-vector-width=512 -mno-fma

Switching to -march=ivybridge -mtune=skylake-avx512 and dropping the other
options (and still on Ryzen)
mult took 119195 clocks -march=ivybridge -mtune=skylake-avx512 

With -march=znver1 -mtune=skylake-avx512 and dropping the other options
mult took 182799 clocks -march=znver1 -mtune=skylake-avx512

So the combination of -march=ivybridge -mtune=skylake-avx512 is doing something
right.

Reply via email to