https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81303
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2017-07-12 CC| |wilco at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #3 from Wilco <wilco at gcc dot gnu.org> --- Confirmed, on AArch64 bwaves is ~20% slower in SPEC2006 and ~30% slower in SPEC2017. There are twice as many spills (outside the inner loop) and the vectors are created in an inefficient way: ldr d4, [x5,x27] ld1r {v6.2d}, [x5] mov v6.d[1], v4.d[0] add x5, x5, x26 fmla v1.2d, v20.2d, v6.2d