On 15/02/17 15:03, Kyrill Tkachov wrote:
> Hi Richard,
> 
> On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
>> On 03/02/17 17:12, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> While evaluating Maxim's SW prefetch patches [1] I noticed that the
>>> aarch64 prefetch pattern is
>>> overly restrictive in its address operand. It only accepts simple
>>> register addressing modes.
>>> In fact, the PRFM instruction accepts almost all modes that a normal
>>> 64-bit LDR supports.
>>> The restriction in the pattern leads to explicit address calculation
>>> code to be emitted which we could avoid.
>>>
>>> This patch relaxes the restrictions on the prefetch define_insn. It
>>> creates a predicate and constraint that
>>> allow the full addressing modes that PRFM allows. Thus for the testcase
>>> in the patch (adapted from one of the existing
>>> __builtin_prefetch tests in the testsuite) we can generate a:
>>> prfm    PLDL1STRM, [x1, 8]
>>>
>>> instead of the current
>>> prfm    PLDL1STRM, [x1]
>>> with an explicit increment of x1 by 8 in a separate instruction.
>>>
>>> I've removed the %a output modifier in the output template and wrapped
>>> the address operand into a DImode MEM before
>>> passing it down to aarch64_print_operand.
>>>
>>> This is because operand 0 is an address operand rather than a memory
>>> operand and thus doesn't have a mode associated
>>> with it.  When processing the 'a' output modifier the code in final.c
>>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
>>> argument.  This will ICE on aarch64 because we need a mode for the
>>> memory in order for aarch64_classify_address to work
>>> correctly.  Rather than overriding the VOIDmode in
>>> aarch64_print_operand_address I decided to instead create the DImode
>>> MEM in the "prefetch" output template and treat it as a normal 64-bit
>>> memory address, which at the point of assembly output
>>> is what it is anyway.
>>>
>>> With this patch I see a reduction in instruction count in the SPEC2006
>>> benchmarks when SW prefetching is enabled on top
>>> of Maxim's patchset because fewer address calculation instructions are
>>> emitted due to the use of the more expressive
>>> addressing modes. It also fixes a performance regression that I observed
>>> in 410.bwaves from Maxim's patches on Cortex-A72.
>>> I'll be running a full set of benchmarks to evaluate this further, but I
>>> think this is the right thing to do.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Maxim, do you want to try this on top of your patches on your hardware
>>> to see if it helps with the regressions you mentioned?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>>>
>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>>>      constraint on operand 0 to allow more general addressing modes.
>>>      Adjust output template.
>>>      * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>>>      New function.
>>>      * config/aarch64/aarch64-protos.h
>>>      (aarch64_address_valid_for_prefetch_p): Declare prototype.
>>>      * config/aarch64/constraints.md (Dp): New address constraint.
>>>      * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>>>      predicate.
>>>
>>> 2016-02-03  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>>>
>>> aarch64-prfm-imm.patch
>>>
>> Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
>> contains an address, not a MEM.  So it's theoretically possible for
>> generic code to want to look inside the first operand and find an
>> address directly.  This change would break that assumption.
> 
> With this change the prefetch operand is still an address, not a MEM
> during all the
> optimisation passes.
> It's wrapped in a MEM only during the ultimate printing of the assembly
> string
> during 'final'.
> 

Ah!  I'd missed that.

This is OK for stage1.

R.

> Kyrill
> 
>> R.
>>
>>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
>>> Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
>>> Date:   Thu Feb 2 14:46:11 2017 +0000
>>>
>>>      [AArch64] Accept more addressing modes for PRFM
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index babc327..61706de 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>>>     HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>> unsigned);
>>>   int aarch64_get_condition_code (rtx);
>>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>>>   bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>>   unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>>>   unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index acc093a..c05eff3 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct
>>> aarch64_address_info *info,
>>>       }
>>>   }
>>>   +/* Return true if the address X is valid for a PRFM instruction.
>>> +   STRICT_P is true if we should do strict checking with
>>> +   aarch64_classify_address.  */
>>> +
>>> +bool
>>> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
>>> +{
>>> +  struct aarch64_address_info addr;
>>> +
>>> +  /* PRFM accepts the same addresses as DImode...  */
>>> +  bool res = aarch64_classify_address (&addr, x, DImode, MEM,
>>> strict_p);
>>> +  if (!res)
>>> +    return false;
>>> +
>>> +  /* ... except writeback forms.  */
>>> +  return addr.type != ADDRESS_REG_WB;
>>> +}
>>> +
>>>   bool
>>>   aarch64_symbolic_address_p (rtx x)
>>>   {
>>> diff --git a/gcc/config/aarch64/aarch64.md
>>> b/gcc/config/aarch64/aarch64.md
>>> index b9e8edf..c6201a5 100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -518,27 +518,31 @@ (define_insn "nop"
>>>   )
>>>     (define_insn "prefetch"
>>> -  [(prefetch (match_operand:DI 0 "register_operand" "r")
>>> +  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
>>>               (match_operand:QI 1 "const_int_operand" "")
>>>               (match_operand:QI 2 "const_int_operand" ""))]
>>>     ""
>>>     {
>>> -    const char * pftype[2][4] =
>>> +    const char * pftype[2][4] =
>>>       {
>>> -      {"prfm\\tPLDL1STRM, %a0",
>>> -       "prfm\\tPLDL3KEEP, %a0",
>>> -       "prfm\\tPLDL2KEEP, %a0",
>>> -       "prfm\\tPLDL1KEEP, %a0"},
>>> -      {"prfm\\tPSTL1STRM, %a0",
>>> -       "prfm\\tPSTL3KEEP, %a0",
>>> -       "prfm\\tPSTL2KEEP, %a0",
>>> -       "prfm\\tPSTL1KEEP, %a0"},
>>> +      {"prfm\\tPLDL1STRM, %0",
>>> +       "prfm\\tPLDL3KEEP, %0",
>>> +       "prfm\\tPLDL2KEEP, %0",
>>> +       "prfm\\tPLDL1KEEP, %0"},
>>> +      {"prfm\\tPSTL1STRM, %0",
>>> +       "prfm\\tPSTL3KEEP, %0",
>>> +       "prfm\\tPSTL2KEEP, %0",
>>> +       "prfm\\tPSTL1KEEP, %0"},
>>>       };
>>>         int locality = INTVAL (operands[2]);
>>>         gcc_assert (IN_RANGE (locality, 0, 3));
>>>   +    /* PRFM accepts the same addresses as a 64-bit LDR so wrap
>>> +       the address into a DImode MEM so that aarch64_print_operand
>>> knows
>>> +       how to print it.  */
>>> +    operands[0] = gen_rtx_MEM (DImode, operands[0]);
>>>       return pftype[INTVAL(operands[1])][locality];
>>>     }
>>>     [(set_attr "type" "load1")]
>>> diff --git a/gcc/config/aarch64/constraints.md
>>> b/gcc/config/aarch64/constraints.md
>>> index 5a252c0..b829337 100644
>>> --- a/gcc/config/aarch64/constraints.md
>>> +++ b/gcc/config/aarch64/constraints.md
>>> @@ -214,3 +214,8 @@ (define_constraint "Dd"
>>>    A constraint that matches an immediate operand valid for AdvSIMD
>>> scalar."
>>>    (and (match_code "const_int")
>>>         (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
>>> +
>>> +(define_address_constraint "Dp"
>>> +  "@internal
>>> + An address valid for a prefetch instruction."
>>> + (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
>>> diff --git a/gcc/config/aarch64/predicates.md
>>> b/gcc/config/aarch64/predicates.md
>>> index e83d45b..8e3ea9b 100644
>>> --- a/gcc/config/aarch64/predicates.md
>>> +++ b/gcc/config/aarch64/predicates.md
>>> @@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
>>>          (match_test "aarch64_legitimate_address_p (mode, XEXP (op,
>>> 0), PARALLEL,
>>>                              0)")))
>>>   +(define_predicate "aarch64_prefetch_operand"
>>> +  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
>>> +
>>>   (define_predicate "aarch64_valid_symref"
>>>     (match_code "const, symbol_ref, label_ref")
>>>   {
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>> b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>> new file mode 100644
>>> index 0000000..26ab913
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
>>> @@ -0,0 +1,18 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +/* Check that we can generate the immediate-offset addressing
>>> +   mode for PRFM.  */
>>> +
>>> +#define ARRSIZE 65
>>> +int *bad_addr[ARRSIZE];
>>> +
>>> +void
>>> +prefetch_for_read (void)
>>> +{
>>> +  int i;
>>> +  for (i = 0; i < ARRSIZE; i++)
>>> +    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1
>>> } } */
>>>
> 

Reply via email to