Re: Missed lowering to ld1rq from svld1rq for memory operand

Prathamesh Kulkarni via Gcc-patches Tue, 10 Jan 2023 09:35:19 -0800

On Fri, 5 Aug 2022 at 17:49, Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> > Hi Richard,
> > Following from off-list discussion, in the attached patch, I wrote pattern
> > similar to vec_duplicate<mode>_reg, which seems to work for the svld1rq 
> > tests.
> > Does it look OK ?
> >
> > Sorry, I didn't fully understand your suggestion on integrating with
> > vec_duplicate<mode>_reg
> > pattern. For vec_duplicate<mode>_reg, the operand to vec_duplicate expects
> > mode to be <VEL>, while the pattern in patch expects operand of
> > vec_duplicate to have mode <V128>.
> > How do we write a pattern so an operand can accept either of the 2 modes ?
>
> I quoted the wrong one, sorry, should have been
> aarch64_vec_duplicate_vq<mode>_le.
>
> > Also it seems <V128> cannot be used with SVE_ALL ?
>
> Yeah, these would be SVE_FULL only.
Hi Richard,
Sorry for the very late reply. I have attached patch, to integrate
with vec_duplicate_vq<mode>_le.
Bootstrapped+tested on aarch64-linux-gnu.
OK to commit ?


Thanks,
Prathamesh
>
> Richard
>

gcc/
        * config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq<mode>_le):
        Change to define_insn_and_split to fold ldr+dup to ld1rq.
        * config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New.

testsuite/
        * gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index b8cc47ef5fc..4548375b8d6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2533,14 +2533,34 @@
 )
 
 ;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
-(define_insn "@aarch64_vec_duplicate_vq<mode>_le"
-  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+
+(define_insn_and_split "@aarch64_vec_duplicate_vq<mode>_le"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w, w")
        (vec_duplicate:SVE_FULL
-         (match_operand:<V128> 1 "register_operand" "w")))]
+         (match_operand:<V128> 1 "aarch64_sve_dup_ld1rq_operand" "w, UtQ")))
+   (clobber (match_scratch:VNx16BI 2 "=X, Upl"))]
   "TARGET_SVE && !BYTES_BIG_ENDIAN"
   {
-    operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
-    return "dup\t%0.q, %1.q[0]";
+    switch (which_alternative)
+      {
+       case 0:
+         operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
+         return "dup\t%0.q, %1.q[0]";
+       case 1:
+         return "#";
+       default:
+         gcc_unreachable ();
+      }
+  }
+  "&& MEM_P (operands[1])"
+  [(const_int 0)]
+  {
+    if (GET_CODE (operands[2]) == SCRATCH)
+      operands[2] = gen_reg_rtx (VNx16BImode);
+    emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+    rtx gp = gen_lowpart (<VPRED>mode, operands[2]);
+    emit_insn (gen_aarch64_sve_ld1rq<mode> (operands[0], operands[1], gp));
+    DONE;
   }
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index ff7f73d3f30..6062f37025e 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -676,6 +676,10 @@
   (ior (match_operand 0 "register_operand")
        (match_operand 0 "aarch64_sve_ld1r_operand")))
 
+(define_predicate "aarch64_sve_dup_ld1rq_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_ld1rq_operand")))
+
 (define_predicate "aarch64_sve_ptrue_svpattern_immediate"
   (and (match_code "const")
        (match_test "aarch64_sve_ptrue_svpattern_p (op, NULL)")))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
index 196de3f5e0a..c38204e6874 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
@@ -26,4 +26,4 @@ TEST(svfloat64_t, float64_t, f64)
 
 TEST(svbfloat16_t, bfloat16_t, bf16)
 
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]} 12 { 
target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-not {\tdup\t} } } */

Re: Missed lowering to ld1rq from svld1rq for memory operand

Reply via email to