https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121315

--- Comment #5 from Alex Coplan <acoplan at gcc dot gnu.org> ---
So if I artificially increase the cost of the ADDRESS_REG_REG case by 1 in
aarch64_address_cost, then we get the desired codegen:

.L3:
        ldp     q31, q30, [x2], 32
        rev32   v31.16b, v31.16b
        rev32   v30.16b, v30.16b
        stp     q31, q30, [x3], 32
        cmp     x2, x0
        bne     .L3
        ret

so we could try and do this if the tuning says we should try and form LDP/STP,
but it's quite a big hammer, and will penalise cases where LDP/STP cannot be
formed, and reg+reg addressing would be beneficial.

What we really need is more information passed down from ivopts to the
address_cost hook, e.g. something that at least tells us whether the base has
multiple address uses in the loop.

Reply via email to