Hi all, The most common way to load and store TImode value in aarch64 is to perform an LDP/STP of two X-registers. This is the *movti_aarch64 pattern in aarch64.md. There is a bug in the logic in aarch64_classify_address where it validates the offset in the address used to load a TImode value. It passes down TImode to the aarch64_offset_7bit_signed_scaled_p check which rejects offsets that are not a multiple of the mode size of TImode (16). However, this is too conservative as X-reg LDP/STP instructions accept immediate offsets that are a multiple of 8.
Also, considering that the definition of aarch64_offset_7bit_signed_scaled_p is: return (offset >= -64 * GET_MODE_SIZE (mode) && offset < 64 * GET_MODE_SIZE (mode) && offset % GET_MODE_SIZE (mode) == 0); I think the range check may even be wrong for TImode as this will accept offsets in the range [-1024, 1024) (as long as they are a multiple of 16) whereas X-reg LDP/STP instructions only accept offsets in the range [-512, 512). So since the check is for an X-reg LDP/STP address we should be passing down DImode. This patch does that and enables more aggressive generation of REG+IMM addressing modes for 64-bit aligned TImode values, eliminating many address calculation instructions. For the testcase in the patch we currently generate: bar: add x1, x1, 8 add x0, x0, 8 ldp x2, x3, [x1] stp x2, x3, [x0] ret whereas with this patch we generate: bar: ldp x2, x3, [x1, 8] stp x2, x3, [x0, 8] ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2016-07-13 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP addresses. 2016-07-13 Kyrylo Tkachov <kyrylo.tkac...@arm.com> * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info *info, X,X: 7-bit signed scaled offset Q: 9-bit signed offset We conservatively require an offset representable in either mode. - */ + When performing the check for pairs of X registers i.e. LDP/STP + pass down DImode since that is the natural size of the LDP/STP + instruction memory accesses. */ if (mode == TImode || mode == TFmode) - return (aarch64_offset_7bit_signed_scaled_p (mode, offset) + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) && offset_9bit_signed_unscaled_p (mode, offset)); /* A 7bit offset check because OImode will emit a ldp/stp diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c new file mode 100644 index 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c @@ -0,0 +1,20 @@ +/* { dg-options "-O2" } */ + +/* Check that we can use a REG + IMM addressing mode when moving an unaligned + TImode value to and from memory. */ + +struct foo +{ + long long b; + __int128 a; +} __attribute__ ((packed)); + +void +bar (struct foo *p, struct foo *q) +{ + p->a = q->a; +} + +/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */