https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88836
--- Comment #2 from kugan at gcc dot gnu.org --- Created attachment 45795 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45795&action=edit RFC patch AFIK, we need to: 1. Change the whilelo pattern in backend 2. Change RTL CSE - Add support for VEC_DUPLICATE - When handling PARALLEL rtx, we may kill CSE defined in the first set so that it docent reach Attached patch fix this. With the patch I now have: .LFB0: .cfi_startproc cmp w3, 0 ble .L1 sub w4, w3, #1 cntw x3 lsr w4, w4, 1 add w4, w4, 1 whilelo p0.s, xzr, x4 .p2align 3,,7 .L3: ld2w {z4.s - z5.s}, p0/z, [x1] ld2w {z2.s - z3.s}, p0/z, [x2] add z0.s, z4.s, z2.s sub z1.s, z5.s, z3.s st2w {z0.s - z1.s}, p0, [x0] incb x1, all, mul #2 whilelo p0.s, x3, x4 incb x0, all, mul #2 incw x3 incb x2, all, mul #2 bne .L3 .L1: ret .cfi_endproc