Hi Yury [cc'ing the ARM maintainers]

On 16/06/15 15:04, Yury Usishchev wrote:
Hello!

Following patch fixes PR target/66433.

As described in PR, cost of memory operation with autoincrement is
considered to be greater than same operation without autoincrement. This
causes auto-inc-dec pass not to optimize vector memory operations like
vld and vst.

The autoincrement form may not always be as cheap as a
simple memory op, since it does involve an implicit addition
operation.

I've tried out your patch and I do see the autoincrement forms
being used more aggressively. Do you have any benchmark data
for making this change?



Bootstrapped and regtested on armv7l-linux-gnueabi on trunk.
OK for trunk?

     case MEM:
       /* A memory access costs 1 insn if the mode is small, or the address is
         a single register, otherwise it costs one insn per word.  */
-      if (REG_P (XEXP (x, 0)))
+      if (REG_P (XEXP (x, 0))
+         || (GET_RTX_CLASS (GET_CODE (XEXP (x, 0))) == RTX_AUTOINC
+             && REG_P (XEXP (XEXP (x, 0), 0))))
        *cost = COSTS_N_INSNS (1);
       else if (flag_pic
               && GET_CODE (XEXP (x, 0)) == PLUS


I would have hoped that auto-inc-dec.c would be using address costs rather than 
rtx costs
here, but I don't think it's well defined who is responsible for choosing 
preferences between
these autoinc ops :(
I note that in our arm_arm_address_cost we already consider the autoinc modes 
to be cheap.

One situation that we want to avoid is for non-NEON memory ops sequences of the 
form:
ldr ra, [rn, #4]
ldr rb, [rn, #8]
ldr rc, [rn, #12]
add rn, rn, #16

being transformed into:
ldr ra, [rn]!
ldr rb, [rn]!
ldr rc, [rn]!

So I think at least for non-vector/FP modes where we can use offsets we should 
consider
autoinc ops to be slightly more expensive (COSTS_N_INSNS (2) instead of 
COSTS_N_INSNS (1)).

But when optimising for size, we should prefer the autoinc forms since they can 
save us on
add/sub instructions.

Thanks,
Kyrill


--
BR,
Yury Usishchev


Reply via email to