Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

Andrew Stubbs Tue, 14 Feb 2023 06:29:30 -0800

On 09/02/2023 20:13, Andrew Jenner wrote:

This patch introduces instruction patterns for complex number operationsin the GCN machine description. These patterns are cmul, cmul_conj,vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls(cmla_conj and cmls_conj were not found to be favorable to implement).As a side effect of adding cmls, I also added fms patterns correspondingto the existing fma patterns. Tested on CDNA2 GFX90a.


OK to commit?


gcc/ChangeLog:

     * config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)
         (gcn_expand_dpp_distribute_even_insn)
         (gcn_expand_dpp_distribute_odd_insn): Declare.
         * config/gcn/gcn-valu.md (@dpp_swap_pairs<mode>)
         (@dpp_distribute_even<mode>, @dpp_distribute_odd<mode>)
         (cmul<conj_op><mode>3, cml<addsub_as><mode>4, vec_addsub<mode>3)
         (cadd<rot><mode>3, vec_fmaddsub<mode>4, vec_fmsubadd<mode>4)
         (fms<mode>4<exec>, fms<mode>4_negop2<exec>, fms<mode>4)
         (fms<mode>4_negop2): New patterns.
         * config/gcn/gcn.cc (gcn_expand_dpp_swap_pairs_insn)
         (gcn_expand_dpp_distribute_even_insn)
         (gcn_expand_dpp_distribute_odd_insn): New functions.
         * config/gcn/gcn.md: Add entries to unspec enum.

gcc/testsuite/ChangeLog:

     * gcc.target/gcn/complex.c: New test.


+;; It would be possible to represent these without the UNSPEC as
+;;
+;; (vec_merge
+;;   (fma op1 op2 op3)
+;;   (fma op1 op2 (neg op3))
+;;   (merge-const))
+;;
+;; But this doesn't seem useful in practice.
+
+(define_expand "vec_fmaddsub<mode>4"
+  [(set (match_operand:V_noHI 0 "register_operand" "=&v")
+        (unspec:V_noHI
+          [(match_operand:V_noHI 1 "register_operand" "v")
+           (match_operand:V_noHI 2 "register_operand" "v")
+           (match_operand:V_noHI 3 "register_operand" "v")]
+          UNSPEC_FMADDSUB))]

This is a define_expand pattern that has a custom-code expansion with anunconditional "DONE", so the actual RTL representation is irrelevanthere: it only needs to have the match_operand entries. TheUNSPEC_FMADDSUB is therefore dead (as in, it will never appear in theIR). We can safely remove those, although I don't hate them forreadability purposes.

The UNSPEC_CMUL and UNSPEC_CMUL_CONJ are similarly "dead", but since youuse them for an iterator they're still useful in the machine description.


+(define_insn "fms<mode>4<exec>"
+  [(set (match_operand:V_FP 0 "register_operand"  "=  v,   v")
+       (fma:V_FP
+         (match_operand:V_FP 1 "gcn_alu_operand" "% vA,  vA")
+       (match_operand:V_FP 2 "gcn_alu_operand" "  vA,vSvA")
+       (neg:V_FP
+         (match_operand:V_FP 3 "gcn_alu_operand" "vSvA,  vA"))))]
+  ""
+  "v_fma%i0\t%0, %1, %2, -%3"
+  [(set_attr "type" "vop3a")
+   (set_attr "length" "8")])

Please ensure that the alternatives are vertically aligned in the samestyle as the rest of the file.


+/* Generate DPP pairwise swap instruction.
+   The opcode is given by INSN.  */
+
+char *
+gcn_expand_dpp_swap_pairs_insn (machine_mode mode, const char *insn,
+                               int ARG_UNUSED (unspec))
....
+/* Generate DPP distribute even instruction.
+   The opcode is given by INSN.  */
+
+char *
+gcn_expand_dpp_distribute_even_insn (machine_mode mode, const char *insn,
+                                    int ARG_UNUSED (unspec))
....
+/* Generate DPP distribute odd instruction.
+   The opcode is given by INSN.  */
+
+char *
+gcn_expand_dpp_distribute_odd_insn (machine_mode mode, const char *insn,
+                                   int ARG_UNUSED (unspec))

Please add a comment that isn't just the function name in words. Explainwhat operation happens here and maybe show an example of what it produces.


+++ b/gcc/testsuite/gcc.target/gcn/complex.c
@@ -0,0 +1,640 @@
+// { dg-do run }
+// { dg-options "-O -fopenmp-simd -ftree-loop-if-convert -fno-ssa-phiopt" }

Does the -fopenmp-simd option do anything here? There are no "ompdeclare simd" directives.


+void cmulF(float *td, float *te, float *tf, float *tg, int tas)
+{
+  typedef _Complex float complexT;
+  int array_size = tas/2;
+  complexT *d = (complexT*)(td);
+  complexT *e = (complexT*)(te);
+  complexT *f = (complexT*)(tf);
+#pragma omp target teams distribute parallel for simd
+  for (int i = 0; i < array_size; i++)
+    {
+      d[i] = e[i] * f[i];
+    }
+}

Tests in gcc.target/gcn won't do anything with "omp target" directives.I would expect the loop to vectorize without, at -O2 or above (or "-O1-ftree-vectorize"), but you might find the output easier to read with"__restrict" on the parameters as that will avoid emitting the runtimealias check and scalar code implementation.


I'd also expect you to have to do something to avoid inlining.

+ td = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size,omp_default_mem_alloc);+ te = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size,omp_default_mem_alloc);+ tf = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size,omp_default_mem_alloc);+ tg = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size,omp_default_mem_alloc);

There's no need to use libgomp to allocate memory on the device --malloc works just fine -- and it doesn't need to be specifically alignedunless you're wanting performance.

In general I'm confused by this testcase because it looks like it waswritten for an offloading toolchain, but it's placed into thebare-machine GCN testsuite.


Andrew

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

Reply via email to