https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102652
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |14.0 --- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- This is fixed in GCC 14.1.0: ``` bug: .LFB3916: .cfi_startproc ldr q24, [x1] cmlt v27.16b, v24.16b, #0 mov v25.16b, v27.16b mov v26.16b, v27.16b st4 {v24.16b - v27.16b}, [x0], 64 ldr q28, [x1, 16] cmlt v31.16b, v28.16b, #0 mov v29.16b, v31.16b mov v30.16b, v31.16b st4 {v28.16b - v31.16b}, [x0] ret ``` Only 4 total mov which is needed to duplicate v31 into v29 and v30 (and v27 into v25/v26) for the st4 case.