https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #50 from Andrew Roberts <andrewm.roberts at sky dot com> --- with the matrix.c benchmark on Ryzen and looking at the other options when using -march=znver1 and -mtune=znver1 mult took 225281 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128 mult took 185961 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=256 mult took 187577 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=512 -adding mno-avx2 has no effect on the above baseline. adding in -mno-fma mult took 223302 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128 -mno-fma mult took 123773 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=256 -mno-fma mult took 124690 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=512 -mno-fma Is the patch in trunk yet? I was assuming it was from the other comments. using -march=ivybridge but keeping the rest of the options: mult took 215052 clocks -march=ivybridge -mtune=znver1 -mprefer-vector-width=128 -mno-fma mult took 121661 clocks -march=ivybridge -mtune=znver1 -mprefer-vector-width=256 -mno-fma mult took 131763 clocks -march=ivybridge -mtune=znver1 -mprefer-vector-width=512 -mno-fma Switching to -march=ivybridge -mtune=skylake-avx512 and dropping the other options (and still on Ryzen) mult took 119195 clocks -march=ivybridge -mtune=skylake-avx512 With -march=znver1 -mtune=skylake-avx512 and dropping the other options mult took 182799 clocks -march=znver1 -mtune=skylake-avx512 So the combination of -march=ivybridge -mtune=skylake-avx512 is doing something right.