On 15/02/17 15:03, Kyrill Tkachov wrote: > Hi Richard, > > On 15/02/17 15:00, Richard Earnshaw (lists) wrote: >> On 03/02/17 17:12, Kyrill Tkachov wrote: >>> Hi all, >>> >>> While evaluating Maxim's SW prefetch patches [1] I noticed that the >>> aarch64 prefetch pattern is >>> overly restrictive in its address operand. It only accepts simple >>> register addressing modes. >>> In fact, the PRFM instruction accepts almost all modes that a normal >>> 64-bit LDR supports. >>> The restriction in the pattern leads to explicit address calculation >>> code to be emitted which we could avoid. >>> >>> This patch relaxes the restrictions on the prefetch define_insn. It >>> creates a predicate and constraint that >>> allow the full addressing modes that PRFM allows. Thus for the testcase >>> in the patch (adapted from one of the existing >>> __builtin_prefetch tests in the testsuite) we can generate a: >>> prfm PLDL1STRM, [x1, 8] >>> >>> instead of the current >>> prfm PLDL1STRM, [x1] >>> with an explicit increment of x1 by 8 in a separate instruction. >>> >>> I've removed the %a output modifier in the output template and wrapped >>> the address operand into a DImode MEM before >>> passing it down to aarch64_print_operand. >>> >>> This is because operand 0 is an address operand rather than a memory >>> operand and thus doesn't have a mode associated >>> with it. When processing the 'a' output modifier the code in final.c >>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode >>> argument. This will ICE on aarch64 because we need a mode for the >>> memory in order for aarch64_classify_address to work >>> correctly. Rather than overriding the VOIDmode in >>> aarch64_print_operand_address I decided to instead create the DImode >>> MEM in the "prefetch" output template and treat it as a normal 64-bit >>> memory address, which at the point of assembly output >>> is what it is anyway. >>> >>> With this patch I see a reduction in instruction count in the SPEC2006 >>> benchmarks when SW prefetching is enabled on top >>> of Maxim's patchset because fewer address calculation instructions are >>> emitted due to the use of the more expressive >>> addressing modes. It also fixes a performance regression that I observed >>> in 410.bwaves from Maxim's patches on Cortex-A72. >>> I'll be running a full set of benchmarks to evaluate this further, but I >>> think this is the right thing to do. >>> >>> Bootstrapped and tested on aarch64-none-linux-gnu. >>> >>> Maxim, do you want to try this on top of your patches on your hardware >>> to see if it helps with the regressions you mentioned? >>> >>> Thanks, >>> Kyrill >>> >>> >>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html >>> >>> 2016-02-03 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> * config/aarch64/aarch64.md (prefetch); Adjust predicate and >>> constraint on operand 0 to allow more general addressing modes. >>> Adjust output template. >>> * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p): >>> New function. >>> * config/aarch64/aarch64-protos.h >>> (aarch64_address_valid_for_prefetch_p): Declare prototype. >>> * config/aarch64/constraints.md (Dp): New address constraint. >>> * config/aarch64/predicates.md (aarch64_prefetch_operand): New >>> predicate. >>> >>> 2016-02-03 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> * gcc.target/aarch64/prfm_imm_offset_1.c: New test. >>> >>> aarch64-prfm-imm.patch >>> >> Hmm, I'm not sure about this. rtl.texi says that a prefetch code >> contains an address, not a MEM. So it's theoretically possible for >> generic code to want to look inside the first operand and find an >> address directly. This change would break that assumption. > > With this change the prefetch operand is still an address, not a MEM > during all the > optimisation passes. > It's wrapped in a MEM only during the ultimate printing of the assembly > string > during 'final'. >
Ah! I'd missed that. This is OK for stage1. R. > Kyrill > >> R. >> >>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b >>> Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> Date: Thu Feb 2 14:46:11 2017 +0000 >>> >>> [AArch64] Accept more addressing modes for PRFM >>> >>> diff --git a/gcc/config/aarch64/aarch64-protos.h >>> b/gcc/config/aarch64/aarch64-protos.h >>> index babc327..61706de 100644 >>> --- a/gcc/config/aarch64/aarch64-protos.h >>> +++ b/gcc/config/aarch64/aarch64-protos.h >>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params; >>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, >>> unsigned); >>> int aarch64_get_condition_code (rtx); >>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool); >>> bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); >>> unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); >>> unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index acc093a..c05eff3 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct >>> aarch64_address_info *info, >>> } >>> } >>> +/* Return true if the address X is valid for a PRFM instruction. >>> + STRICT_P is true if we should do strict checking with >>> + aarch64_classify_address. */ >>> + >>> +bool >>> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p) >>> +{ >>> + struct aarch64_address_info addr; >>> + >>> + /* PRFM accepts the same addresses as DImode... */ >>> + bool res = aarch64_classify_address (&addr, x, DImode, MEM, >>> strict_p); >>> + if (!res) >>> + return false; >>> + >>> + /* ... except writeback forms. */ >>> + return addr.type != ADDRESS_REG_WB; >>> +} >>> + >>> bool >>> aarch64_symbolic_address_p (rtx x) >>> { >>> diff --git a/gcc/config/aarch64/aarch64.md >>> b/gcc/config/aarch64/aarch64.md >>> index b9e8edf..c6201a5 100644 >>> --- a/gcc/config/aarch64/aarch64.md >>> +++ b/gcc/config/aarch64/aarch64.md >>> @@ -518,27 +518,31 @@ (define_insn "nop" >>> ) >>> (define_insn "prefetch" >>> - [(prefetch (match_operand:DI 0 "register_operand" "r") >>> + [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp") >>> (match_operand:QI 1 "const_int_operand" "") >>> (match_operand:QI 2 "const_int_operand" ""))] >>> "" >>> { >>> - const char * pftype[2][4] = >>> + const char * pftype[2][4] = >>> { >>> - {"prfm\\tPLDL1STRM, %a0", >>> - "prfm\\tPLDL3KEEP, %a0", >>> - "prfm\\tPLDL2KEEP, %a0", >>> - "prfm\\tPLDL1KEEP, %a0"}, >>> - {"prfm\\tPSTL1STRM, %a0", >>> - "prfm\\tPSTL3KEEP, %a0", >>> - "prfm\\tPSTL2KEEP, %a0", >>> - "prfm\\tPSTL1KEEP, %a0"}, >>> + {"prfm\\tPLDL1STRM, %0", >>> + "prfm\\tPLDL3KEEP, %0", >>> + "prfm\\tPLDL2KEEP, %0", >>> + "prfm\\tPLDL1KEEP, %0"}, >>> + {"prfm\\tPSTL1STRM, %0", >>> + "prfm\\tPSTL3KEEP, %0", >>> + "prfm\\tPSTL2KEEP, %0", >>> + "prfm\\tPSTL1KEEP, %0"}, >>> }; >>> int locality = INTVAL (operands[2]); >>> gcc_assert (IN_RANGE (locality, 0, 3)); >>> + /* PRFM accepts the same addresses as a 64-bit LDR so wrap >>> + the address into a DImode MEM so that aarch64_print_operand >>> knows >>> + how to print it. */ >>> + operands[0] = gen_rtx_MEM (DImode, operands[0]); >>> return pftype[INTVAL(operands[1])][locality]; >>> } >>> [(set_attr "type" "load1")] >>> diff --git a/gcc/config/aarch64/constraints.md >>> b/gcc/config/aarch64/constraints.md >>> index 5a252c0..b829337 100644 >>> --- a/gcc/config/aarch64/constraints.md >>> +++ b/gcc/config/aarch64/constraints.md >>> @@ -214,3 +214,8 @@ (define_constraint "Dd" >>> A constraint that matches an immediate operand valid for AdvSIMD >>> scalar." >>> (and (match_code "const_int") >>> (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))"))) >>> + >>> +(define_address_constraint "Dp" >>> + "@internal >>> + An address valid for a prefetch instruction." >>> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)")) >>> diff --git a/gcc/config/aarch64/predicates.md >>> b/gcc/config/aarch64/predicates.md >>> index e83d45b..8e3ea9b 100644 >>> --- a/gcc/config/aarch64/predicates.md >>> +++ b/gcc/config/aarch64/predicates.md >>> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand" >>> (match_test "aarch64_legitimate_address_p (mode, XEXP (op, >>> 0), PARALLEL, >>> 0)"))) >>> +(define_predicate "aarch64_prefetch_operand" >>> + (match_test "aarch64_address_valid_for_prefetch_p (op, false)")) >>> + >>> (define_predicate "aarch64_valid_symref" >>> (match_code "const, symbol_ref, label_ref") >>> { >>> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c >>> b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c >>> new file mode 100644 >>> index 0000000..26ab913 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c >>> @@ -0,0 +1,18 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2" } */ >>> + >>> +/* Check that we can generate the immediate-offset addressing >>> + mode for PRFM. */ >>> + >>> +#define ARRSIZE 65 >>> +int *bad_addr[ARRSIZE]; >>> + >>> +void >>> +prefetch_for_read (void) >>> +{ >>> + int i; >>> + for (i = 0; i < ARRSIZE; i++) >>> + __builtin_prefetch (bad_addr[i] + 2, 0, 0); >>> +} >>> + >>> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 >>> } } */ >>> >