[Bug target/80568] x86 -mavx256-split-unaligned-load (and store) is affecting AVX2 code, but probably shouldn't be.

peter at cordes dot ca Tue, 02 May 2017 14:54:12 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568


--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
Using ISA-extension options removes some microarchitectures from the set of
CPUs that can run the code, so it would be appropriate for them to have some
effect on tuning.

A "generic AVX2 CPU" is much more specific than a "generic x86-64 CPU".  For
example, rep ret is useless with -mavx, since PhenomII doesn't support AVX (or
SSE4, actually).

As it stands now, gcc doesn't have a way to tune for a "generic avx2 CPU". 
(i.e. only try to avoid problems on Haswell, Skylake, KNL, Excavator, and
Ryzen.  Don't care about things that are slow on IvyBridge, Steamroller, or
Atom.)

-mtune=haswell tells gcc that bsf/bsr are fast, but that's not the case on
Excavator (at least it isn't on Steamroller).  So -mtune=intel or
-mtune=haswell aren't necessarily appropriate, especially if we're just talking
about -mavx, not -mavx2.

---

In the absence of any -mtune or -march options, -mavx could imply
-mtune=generic-avx, the way -march implies a tuning but can be overridden with
-march=foo -mtune=bar.

Or maybe the default -mtune option should be changed to -mtune=generic-isa, so
users can think of it as a tuning that looks at what -m options are enabled to
decide which uarches it can ignore.

It might be easier to maintain if those tune options are limited to only
disabling workaround-options like rep ret and splitting 256b loads/stores.

Or maybe this suggestion would already add too much maintenance work.

---

I don't know whether -mavx256-split-unaligned-load/store is still worth it if
we take SnB/IvB out of the picture.  If it helps a lot for Excavator/Zen, then
maybe.  It probably hurts for KNL, which easily bottlenecks on decode
throughput according to Agner Fog, so more instructions is definitely bad.

---

I didn't find any related bug reports, searching even on closed bugs for split
unaligned load, or for  -mavx256-split-unaligned-load.  I did search some
(including in git for the commit that changed this), but didn't find anything.

Thanks for confirming that it was an intentional bugfix.

[Bug target/80568] x86 -mavx256-split-unaligned-load (and store) is affecting AVX2 code, but probably shouldn't be.

Reply via email to