https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121315
--- Comment #5 from Alex Coplan <acoplan at gcc dot gnu.org> --- So if I artificially increase the cost of the ADDRESS_REG_REG case by 1 in aarch64_address_cost, then we get the desired codegen: .L3: ldp q31, q30, [x2], 32 rev32 v31.16b, v31.16b rev32 v30.16b, v30.16b stp q31, q30, [x3], 32 cmp x2, x0 bne .L3 ret so we could try and do this if the tuning says we should try and form LDP/STP, but it's quite a big hammer, and will penalise cases where LDP/STP cannot be formed, and reg+reg addressing would be beneficial. What we really need is more information passed down from ivopts to the address_cost hook, e.g. something that at least tells us whether the base has multiple address uses in the loop.