On 03/02/17 17:12, Kyrill Tkachov wrote: > Hi all, > > While evaluating Maxim's SW prefetch patches [1] I noticed that the > aarch64 prefetch pattern is > overly restrictive in its address operand. It only accepts simple > register addressing modes. > In fact, the PRFM instruction accepts almost all modes that a normal > 64-bit LDR supports. > The restriction in the pattern leads to explicit address calculation > code to be emitted which we could avoid. > > This patch relaxes the restrictions on the prefetch define_insn. It > creates a predicate and constraint that > allow the full addressing modes that PRFM allows. Thus for the testcase > in the patch (adapted from one of the existing > __builtin_prefetch tests in the testsuite) we can generate a: > prfm PLDL1STRM, [x1, 8] > > instead of the current > prfm PLDL1STRM, [x1] > with an explicit increment of x1 by 8 in a separate instruction. > > I've removed the %a output modifier in the output template and wrapped > the address operand into a DImode MEM before > passing it down to aarch64_print_operand. > > This is because operand 0 is an address operand rather than a memory > operand and thus doesn't have a mode associated > with it. When processing the 'a' output modifier the code in final.c > will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode > argument. This will ICE on aarch64 because we need a mode for the > memory in order for aarch64_classify_address to work > correctly. Rather than overriding the VOIDmode in > aarch64_print_operand_address I decided to instead create the DImode > MEM in the "prefetch" output template and treat it as a normal 64-bit > memory address, which at the point of assembly output > is what it is anyway. > > With this patch I see a reduction in instruction count in the SPEC2006 > benchmarks when SW prefetching is enabled on top > of Maxim's patchset because fewer address calculation instructions are > emitted due to the use of the more expressive > addressing modes. It also fixes a performance regression that I observed > in 410.bwaves from Maxim's patches on Cortex-A72. > I'll be running a full set of benchmarks to evaluate this further, but I > think this is the right thing to do. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Maxim, do you want to try this on top of your patches on your hardware > to see if it helps with the regressions you mentioned? > > Thanks, > Kyrill > > > [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html > > 2016-02-03 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.md (prefetch); Adjust predicate and > constraint on operand 0 to allow more general addressing modes. > Adjust output template. > * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p): > New function. > * config/aarch64/aarch64-protos.h > (aarch64_address_valid_for_prefetch_p): Declare prototype. > * config/aarch64/constraints.md (Dp): New address constraint. > * config/aarch64/predicates.md (aarch64_prefetch_operand): New > predicate. > > 2016-02-03 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/prfm_imm_offset_1.c: New test. > > aarch64-prfm-imm.patch >
Hmm, I'm not sure about this. rtl.texi says that a prefetch code contains an address, not a MEM. So it's theoretically possible for generic code to want to look inside the first operand and find an address directly. This change would break that assumption. R. > > commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b > Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Date: Thu Feb 2 14:46:11 2017 +0000 > > [AArch64] Accept more addressing modes for PRFM > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index babc327..61706de 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params; > > HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); > int aarch64_get_condition_code (rtx); > +bool aarch64_address_valid_for_prefetch_p (rtx, bool); > bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); > unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); > unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index acc093a..c05eff3 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info > *info, > } > } > > +/* Return true if the address X is valid for a PRFM instruction. > + STRICT_P is true if we should do strict checking with > + aarch64_classify_address. */ > + > +bool > +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p) > +{ > + struct aarch64_address_info addr; > + > + /* PRFM accepts the same addresses as DImode... */ > + bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p); > + if (!res) > + return false; > + > + /* ... except writeback forms. */ > + return addr.type != ADDRESS_REG_WB; > +} > + > bool > aarch64_symbolic_address_p (rtx x) > { > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index b9e8edf..c6201a5 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -518,27 +518,31 @@ (define_insn "nop" > ) > > (define_insn "prefetch" > - [(prefetch (match_operand:DI 0 "register_operand" "r") > + [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp") > (match_operand:QI 1 "const_int_operand" "") > (match_operand:QI 2 "const_int_operand" ""))] > "" > { > - const char * pftype[2][4] = > + const char * pftype[2][4] = > { > - {"prfm\\tPLDL1STRM, %a0", > - "prfm\\tPLDL3KEEP, %a0", > - "prfm\\tPLDL2KEEP, %a0", > - "prfm\\tPLDL1KEEP, %a0"}, > - {"prfm\\tPSTL1STRM, %a0", > - "prfm\\tPSTL3KEEP, %a0", > - "prfm\\tPSTL2KEEP, %a0", > - "prfm\\tPSTL1KEEP, %a0"}, > + {"prfm\\tPLDL1STRM, %0", > + "prfm\\tPLDL3KEEP, %0", > + "prfm\\tPLDL2KEEP, %0", > + "prfm\\tPLDL1KEEP, %0"}, > + {"prfm\\tPSTL1STRM, %0", > + "prfm\\tPSTL3KEEP, %0", > + "prfm\\tPSTL2KEEP, %0", > + "prfm\\tPSTL1KEEP, %0"}, > }; > > int locality = INTVAL (operands[2]); > > gcc_assert (IN_RANGE (locality, 0, 3)); > > + /* PRFM accepts the same addresses as a 64-bit LDR so wrap > + the address into a DImode MEM so that aarch64_print_operand knows > + how to print it. */ > + operands[0] = gen_rtx_MEM (DImode, operands[0]); > return pftype[INTVAL(operands[1])][locality]; > } > [(set_attr "type" "load1")] > diff --git a/gcc/config/aarch64/constraints.md > b/gcc/config/aarch64/constraints.md > index 5a252c0..b829337 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -214,3 +214,8 @@ (define_constraint "Dd" > A constraint that matches an immediate operand valid for AdvSIMD scalar." > (and (match_code "const_int") > (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))"))) > + > +(define_address_constraint "Dp" > + "@internal > + An address valid for a prefetch instruction." > + (match_test "aarch64_address_valid_for_prefetch_p (op, true)")) > diff --git a/gcc/config/aarch64/predicates.md > b/gcc/config/aarch64/predicates.md > index e83d45b..8e3ea9b 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand" > (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), > PARALLEL, > 0)"))) > > +(define_predicate "aarch64_prefetch_operand" > + (match_test "aarch64_address_valid_for_prefetch_p (op, false)")) > + > (define_predicate "aarch64_valid_symref" > (match_code "const, symbol_ref, label_ref") > { > diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c > b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c > new file mode 100644 > index 0000000..26ab913 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +/* Check that we can generate the immediate-offset addressing > + mode for PRFM. */ > + > +#define ARRSIZE 65 > +int *bad_addr[ARRSIZE]; > + > +void > +prefetch_for_read (void) > +{ > + int i; > + for (i = 0; i < ARRSIZE; i++) > + __builtin_prefetch (bad_addr[i] + 2, 0, 0); > +} > + > +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */ >