https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283
wilco at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #12 from wilco at gcc dot gnu.org --- There are 2 separate issues in the ARMv7 case. One is scheduling, the -S output goes down from 437 lines to 305 lines with -fno-schedule-insns (stack size 276 rather than 448 bytes). So basically the "register pressure aware" scheduler introduces lots of unnecessary spills. The 2nd issue is related to use of single-element operations within vectors. If I change the define to do an explicit dup, eg. vmulq_f32((b), vdupq_n_f32(a)), I get 211 lines and no spills at all. Switching scheduling on again gives 326 lines so it's spilling like crazy. Both issues seem to have been present since at least 4.8.2.