Hi Richard,

On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
On 03/02/17 17:12, Kyrill Tkachov wrote:
Hi all,

While evaluating Maxim's SW prefetch patches [1] I noticed that the
aarch64 prefetch pattern is
overly restrictive in its address operand. It only accepts simple
register addressing modes.
In fact, the PRFM instruction accepts almost all modes that a normal
64-bit LDR supports.
The restriction in the pattern leads to explicit address calculation
code to be emitted which we could avoid.

This patch relaxes the restrictions on the prefetch define_insn. It
creates a predicate and constraint that
allow the full addressing modes that PRFM allows. Thus for the testcase
in the patch (adapted from one of the existing
__builtin_prefetch tests in the testsuite) we can generate a:
prfm    PLDL1STRM, [x1, 8]

instead of the current
prfm    PLDL1STRM, [x1]
with an explicit increment of x1 by 8 in a separate instruction.

I've removed the %a output modifier in the output template and wrapped
the address operand into a DImode MEM before
passing it down to aarch64_print_operand.

This is because operand 0 is an address operand rather than a memory
operand and thus doesn't have a mode associated
with it.  When processing the 'a' output modifier the code in final.c
will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
argument.  This will ICE on aarch64 because we need a mode for the
memory in order for aarch64_classify_address to work
correctly.  Rather than overriding the VOIDmode in
aarch64_print_operand_address I decided to instead create the DImode
MEM in the "prefetch" output template and treat it as a normal 64-bit
memory address, which at the point of assembly output
is what it is anyway.

With this patch I see a reduction in instruction count in the SPEC2006
benchmarks when SW prefetching is enabled on top
of Maxim's patchset because fewer address calculation instructions are
emitted due to the use of the more expressive
addressing modes. It also fixes a performance regression that I observed
in 410.bwaves from Maxim's patches on Cortex-A72.
I'll be running a full set of benchmarks to evaluate this further, but I
think this is the right thing to do.

Bootstrapped and tested on aarch64-none-linux-gnu.

Maxim, do you want to try this on top of your patches on your hardware
to see if it helps with the regressions you mentioned?

Thanks,
Kyrill


[1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html

2016-02-03  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

     * config/aarch64/aarch64.md (prefetch); Adjust predicate and
     constraint on operand 0 to allow more general addressing modes.
     Adjust output template.
     * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
     New function.
     * config/aarch64/aarch64-protos.h
     (aarch64_address_valid_for_prefetch_p): Declare prototype.
     * config/aarch64/constraints.md (Dp): New address constraint.
     * config/aarch64/predicates.md (aarch64_prefetch_operand): New
     predicate.

2016-02-03  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

     * gcc.target/aarch64/prfm_imm_offset_1.c: New test.

aarch64-prfm-imm.patch

Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
contains an address, not a MEM.  So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly.  This change would break that assumption.

With this change the prefetch operand is still an address, not a MEM during all 
the
optimisation passes.
It's wrapped in a MEM only during the ultimate printing of the assembly string
during 'final'.

Kyrill

R.

commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
Author: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
Date:   Thu Feb 2 14:46:11 2017 +0000

     [AArch64] Accept more addressing modes for PRFM

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index babc327..61706de 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
  int aarch64_get_condition_code (rtx);
+bool aarch64_address_valid_for_prefetch_p (rtx, bool);
  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
  unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acc093a..c05eff3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
      }
  }
+/* Return true if the address X is valid for a PRFM instruction.
+   STRICT_P is true if we should do strict checking with
+   aarch64_classify_address.  */
+
+bool
+aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
+{
+  struct aarch64_address_info addr;
+
+  /* PRFM accepts the same addresses as DImode...  */
+  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
+  if (!res)
+    return false;
+
+  /* ... except writeback forms.  */
+  return addr.type != ADDRESS_REG_WB;
+}
+
  bool
  aarch64_symbolic_address_p (rtx x)
  {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9e8edf..c6201a5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -518,27 +518,31 @@ (define_insn "nop"
  )
(define_insn "prefetch"
-  [(prefetch (match_operand:DI 0 "register_operand" "r")
+  [(prefetch (match_operand:DI 0 "aarch64_prefetch_operand" "Dp")
              (match_operand:QI 1 "const_int_operand" "")
              (match_operand:QI 2 "const_int_operand" ""))]
    ""
    {
-    const char * pftype[2][4] =
+    const char * pftype[2][4] =
      {
-      {"prfm\\tPLDL1STRM, %a0",
-       "prfm\\tPLDL3KEEP, %a0",
-       "prfm\\tPLDL2KEEP, %a0",
-       "prfm\\tPLDL1KEEP, %a0"},
-      {"prfm\\tPSTL1STRM, %a0",
-       "prfm\\tPSTL3KEEP, %a0",
-       "prfm\\tPSTL2KEEP, %a0",
-       "prfm\\tPSTL1KEEP, %a0"},
+      {"prfm\\tPLDL1STRM, %0",
+       "prfm\\tPLDL3KEEP, %0",
+       "prfm\\tPLDL2KEEP, %0",
+       "prfm\\tPLDL1KEEP, %0"},
+      {"prfm\\tPSTL1STRM, %0",
+       "prfm\\tPSTL3KEEP, %0",
+       "prfm\\tPSTL2KEEP, %0",
+       "prfm\\tPSTL1KEEP, %0"},
      };
int locality = INTVAL (operands[2]); gcc_assert (IN_RANGE (locality, 0, 3)); + /* PRFM accepts the same addresses as a 64-bit LDR so wrap
+       the address into a DImode MEM so that aarch64_print_operand knows
+       how to print it.  */
+    operands[0] = gen_rtx_MEM (DImode, operands[0]);
      return pftype[INTVAL(operands[1])][locality];
    }
    [(set_attr "type" "load1")]
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 5a252c0..b829337 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -214,3 +214,8 @@ (define_constraint "Dd"
   A constraint that matches an immediate operand valid for AdvSIMD scalar."
   (and (match_code "const_int")
        (match_test "aarch64_simd_imm_scalar_p (op, GET_MODE (op))")))
+
+(define_address_constraint "Dp"
+  "@internal
+ An address valid for a prefetch instruction."
+ (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index e83d45b..8e3ea9b 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -165,6 +165,9 @@ (define_predicate "aarch64_mem_pair_operand"
         (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), 
PARALLEL,
                                               0)")))
+(define_predicate "aarch64_prefetch_operand"
+  (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
+
  (define_predicate "aarch64_valid_symref"
    (match_code "const, symbol_ref, label_ref")
  {
diff --git a/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c 
b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
new file mode 100644
index 0000000..26ab913
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that we can generate the immediate-offset addressing
+   mode for PRFM.  */
+
+#define ARRSIZE 65
+int *bad_addr[ARRSIZE];
+
+void
+prefetch_for_read (void)
+{
+  int i;
+  for (i = 0; i < ARRSIZE; i++)
+    __builtin_prefetch (bad_addr[i] + 2, 0, 0);
+}
+
+/* { dg-final { scan-assembler-times "prfm.*\\\[x\[0-9\]+, 8\\\]" 1 } } */


Reply via email to