Hi Yury [cc'ing the ARM maintainers]
On 16/06/15 15:04, Yury Usishchev wrote:
Hello!
Following patch fixes PR target/66433.
As described in PR, cost of memory operation with autoincrement is
considered to be greater than same operation without autoincrement. This
causes auto-inc-dec pass not to optimize vector memory operations like
vld and vst.
The autoincrement form may not always be as cheap as a
simple memory op, since it does involve an implicit addition
operation.
I've tried out your patch and I do see the autoincrement forms
being used more aggressively. Do you have any benchmark data
for making this change?
Bootstrapped and regtested on armv7l-linux-gnueabi on trunk.
OK for trunk?
case MEM:
/* A memory access costs 1 insn if the mode is small, or the address is
a single register, otherwise it costs one insn per word. */
- if (REG_P (XEXP (x, 0)))
+ if (REG_P (XEXP (x, 0))
+ || (GET_RTX_CLASS (GET_CODE (XEXP (x, 0))) == RTX_AUTOINC
+ && REG_P (XEXP (XEXP (x, 0), 0))))
*cost = COSTS_N_INSNS (1);
else if (flag_pic
&& GET_CODE (XEXP (x, 0)) == PLUS
I would have hoped that auto-inc-dec.c would be using address costs rather than
rtx costs
here, but I don't think it's well defined who is responsible for choosing
preferences between
these autoinc ops :(
I note that in our arm_arm_address_cost we already consider the autoinc modes
to be cheap.
One situation that we want to avoid is for non-NEON memory ops sequences of the
form:
ldr ra, [rn, #4]
ldr rb, [rn, #8]
ldr rc, [rn, #12]
add rn, rn, #16
being transformed into:
ldr ra, [rn]!
ldr rb, [rn]!
ldr rc, [rn]!
So I think at least for non-vector/FP modes where we can use offsets we should
consider
autoinc ops to be slightly more expensive (COSTS_N_INSNS (2) instead of
COSTS_N_INSNS (1)).
But when optimising for size, we should prefer the autoinc forms since they can
save us on
add/sub instructions.
Thanks,
Kyrill
--
BR,
Yury Usishchev