https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #47 from Andrew Roberts <andrewm.roberts at sky dot com> ---
Again with the latest snapshot:
gcc version 8.0.1 20180121

matrix.c is still needing additional options to get the best out of the Ryzen
processor. But is better than before (223029 clocks vs 371978 originally), 
but 122677 is achievable with the right options. However the same can also be
said for haswell as things stand. The haswell (-march=haswell -mtune=haswell)
time has dropped from 190000 to 23000, but do we put that down to
Meltdown/Spectre updates or compiler updates.

With just -O3 on Ryzen:

Top 5
mult took 115669 clocks -march=ivybridge -mtune=skylake-avx512
mult took 118403 clocks -march=corei7-avx -mtune=skylake-avx512
mult took 119379 clocks -march=core-avx-i -mtune=skylake-avx512
mult took 119735 clocks -march=corei7-avx -mtune=skylake
mult took 119901 clocks -march=sandybridge -mtune=broadwell

mult took 120023 clocks -march=sandybridge -mtune=haswell
mult took 121010 clocks -march=corei7-avx -mtune=haswell
mult took 127371 clocks -march=sandybridge -mtune=x86-64
mult took 151208 clocks -march=btver2 -mtune=generic
mult took 152360 clocks -march=ivybridge -mtune=generic
mult took 173926 clocks -march=haswell -mtune=haswell
mult took 177359 clocks -march=znver1 -mtune=athlon64
mult took 180000 clocks -march=ivybridge -mtune=znver1
mult took 188219 clocks -march=znver1 -mtune=generic
mult took 199721 clocks -march=znver1 -mtune=x86-64
mult took 223029 clocks -march=znver1 -mtune=znver1

Bot 5
mult took 377398 clocks -march=znver1 -mtune=bdver3
mult took 377650 clocks -march=knl -mtune=bdver3
mult took 378600 clocks -march=core-avx2 -mtune=bonnell
mult took 381447 clocks -march=skylake-avx512 -mtune=haswell
mult took 388837 clocks -march=skylake-avx512 -mtune=bdver4

On Haswell 

Top 5
mult took 133704 clocks -march=ivybridge -mtune=k8-sse3
mult took 150000 clocks -march=btver2 -mtune=k8
mult took 150000 clocks -march=core-avx-i -mtune=x86-64
mult took 150000 clocks -march=corei7-avx -mtune=nano
mult took 150000 clocks -march=corei7-avx -mtune=opteron

mult took 160000 clocks -march=core-avx-i -mtune=haswell
mult took 190000 clocks -march=haswell -mtune=eden-x4
mult took 190000 clocks -march=ivybridge -mtune=generic
mult took 200000 clocks -march=haswell -mtune=x86-64
mult took 230000 clocks -march=haswell -mtune=haswell
mult took 270000 clocks -march=haswell -mtune=generic

Bot 5
mult took 420000 clocks -march=skylake-avx512 -mtune=bdver2
mult took 420000 clocks -march=znver1 -mtune=bdver3
mult took 420000 clocks -march=znver1 -mtune=bdver4
mult took 430000 clocks -march=bdver2 -mtune=bdver2
mult took 430000 clocks -march=knl -mtune=bdver2

Using 
-mprefer-vector-width=none -mno-fma -mno-avx2 -O3

On Ryzen
Top 5
mult took 116558 clocks -march=haswell -mtune=bdver3
mult took 116673 clocks -march=haswell -mtune=skylake
mult took 117268 clocks -march=sandybridge -mtune=skylake-avx512
mult took 117288 clocks -march=broadwell -mtune=nocona
mult took 118450 clocks -march=corei7-avx -mtune=haswell

mult took 119719 clocks -march=core-avx-i -mtune=znver1
mult took 120028 clocks -march=znver1 -mtune=skylake
mult took 122677 clocks -march=znver1 -mtune=znver1
mult took 123423 clocks -march=haswell -mtune=haswell
mult took 127388 clocks -march=skylake -mtune=x86-64
mult took 130475 clocks -march=znver1 -mtune=x86-64
mult took 132374 clocks -march=sandybridge -mtune=generic
mult took 162317 clocks -march=znver1 -mtune=generic

Bot 5
mult took 300000 clocks -march=nano-x2 -mtune=btver2
mult took 310000 clocks -march=skylake-avx512 -mtune=westmere
mult took 319772 clocks -march=knl -mtune=sandybridge
mult took 320000 clocks -march=eden-x2 -mtune=amdfam10
mult took 330000 clocks -march=atom -mtune=broadwell

On Haswell

Top 5
mult took 123148 clocks -march=bonnell -mtune=ivybridge
mult took 130262 clocks -march=ivybridge -mtune=silvermont
mult took 135299 clocks -march=core-avx2 -mtune=nano-3000
mult took 150000 clocks -march=core-avx2 -mtune=intel
mult took 150000 clocks -march=haswell -mtune=btver1

mult took 170000 clocks -march=core-avx-i -mtune=haswell
mult took 170000 clocks -march=znver1 -mtune=x86-64
mult took 180000 clocks -march=haswell -mtune=haswell
mult took 180000 clocks -march=znver1 -mtune=generic
mult took 210000 clocks -march=haswell -mtune=generic
mult took 230000 clocks -march=haswell -mtune=x86-64

Bot 5
mult took 350000 clocks -march=nano-x4 -mtune=nano-2000
mult took 350000 clocks -march=slm -mtune=skylake-avx512
mult took 360000 clocks -march=barcelona -mtune=broadwell
mult took 360000 clocks -march=nano -mtune=corei7
mult took 360000 clocks -march=nocona -mtune=btver2

Reply via email to