https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
Wilco <wdijkstr at arm dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wdijkstr at arm dot com --- Comment #2 from Wilco <wdijkstr at arm dot com> --- (In reply to Steve Ellcey from comment #0) > Created attachment 43279 [details] > Test case > > The example code comes from milc in SPEC2006. > > GCC on x86 or aarch64 generates better code with -O3 than it does with > -Ofast or '-O3 -ffast-math'. On x86 compiling with '-mfma -O3' I get 5 > vfmadd231sd instructions, 1 vmulsd instruction and 6 vmovsd. With '-mfma > -Ofast' I get 3 vfmadd231sd, 2 vaddsd, 3 vmulsd, and 6 vmovsd. That is two > extra instructions. > > The problem seems to be that -Ofast turns on -ffast-math and that enables > the global reassociation pass (tree-ssa-reassoc.c) and the code changes > done there create some temporary variables which inhibit the recognition > and use of fma instructions. > > Using -O3 and -Ofast on aarch64 shows the same change. I noticed this a while back, the reassociation pass has changed and now we get far fewer fmas. See https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00771.html