https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80570
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |tree-optimization --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- vect__4.5_24 = MEM <vector(8) int> [(int *)ip_12 + ivtmp.15_28 * 1]; vect_tmp_14.6_23 = [vec_unpack_float_lo_expr] vect__4.5_24; vect_tmp_14.6_22 = [vec_unpack_float_hi_expr] vect__4.5_24; MEM <vector(4) double> [(double *)dp_10 + ivtmp.15_28 * 2] = vect_tmp_14.6_23; MEM <vector(4) double> [(double *)dp_10 + 32B + ivtmp.15_28 * 2] = vect_tmp_14.6_22; Even on aarch64: .L2: ldr q0, [x1], 16 sxtl v1.2d, v0.2s sxtl2 v0.2d, v0.4s scvtf v1.2d, v1.2d scvtf v0.2d, v0.2d stp q1, q0, [x0] add x0, x0, 32 cmp x2, x1 bne .L2 But the above is decent really.