RE: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

Tamar Christina via Gcc-patches Tue, 03 Nov 2020 07:07:48 -0800

Hi All,

here is a respin with the requested changes.


I just realized I haven't updated the documentation yet but will do
so soon since I'm sure there will be feedback :)

Thanks,
Tamar

gcc/ChangeLog:

        * doc/md.texi: Document optabs.
        * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
        * optabs.def (cadd90_optab, cadd270_optab): New.
        * tree-vect-slp-patterns.c (linear_loads_p, vect_slp_make_linear,
        class complex_add_pattern,complex_add_pattern::matches): New.
        (complex_operations_pattern::matches): Add complex_add_pattern.

> -----Original Message-----
> From: [email protected] <[email protected]> On
> Behalf Of Richard Biener
> Sent: Tuesday, September 29, 2020 11:44 AM
> To: Richard Sandiford <[email protected]>
> Cc: Tamar Christina <[email protected]>; [email protected];
> nd <[email protected]>; [email protected]
> Subject: Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation
> detection
> 
> On Tue, 29 Sep 2020, Richard Sandiford wrote:
> 
> > Tamar Christina <[email protected]> writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a
> 4e
> > > bbed73a70f29be 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6132,6 +6132,17 @@ floating-point mode.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern @item
> > > +@samp{cadd@var{m}@var{n}3} Perform a vector addition of complex
> > > +numbers in operand 1 with operand 2 rotated by @var{m} degrees
> > > +around the argand plane and storing the result in operand 0.  The
> > > +instruction must perform the operation on data loaded contiguously
> > > +into the vectors.
> >
> > Nitpicking, sorry, but I think it would be better to describe the
> > layout directly rather than in terms of loads, since the preceding
> > operation might not be a load.
> 
> So if we're at that and since GCC vectors do not have complex components
> can we formulate this in terms avoiding 'complex'?
> Isn't this an add of one vector to a vector with adjacant lanes swapped and
> possibly negated?  Mentioning that this would match a complex add in case
> lanes happen to match up with complex real/imag parts is OK but the pattern
> should work equally well if there's no complex numbers involved?
> 
> > I guess the main question is: what representation do we expect for
> > big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >       V     V     V     V      Advanced SIMD LDR
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > but the FC* instructions put the imaginary parts in the more
> > significant lane, so the pairs of elements above would need to be
> > reversed:
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >        \   /       \   /
> >         \ /         \ /
> >          X           X         Load and permute
> >         / \         / \
> >        /   \       /   \
> >    +-----+-----+-----+-----+
> >    | i0  | r0  | i1  | r1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > (Or the whole vector could be reversed.)
> >
> > We might decide that it just isn't worth doing this for Advanced SIMD.
> > But should the semantics of the optab be that:
> >
> > (1) GCC lane number 0 holds a real part, or
> > (2) the least significant lane holds a real part?
> >
> > With (1), it would be up to the target to hide the permute above.
> > With (2), the vectoriser would need to introduce the permute itself.
> >
> > I'm not sure there's a perfect answer even for Arm targets.  (2)
> > matches the Advanced SIMD semantics.  But for SVE, the register layout
> > follows
> > LD1 rather than LDR, and the GCC and architectural lane numbering match
> up.
> > (1) would therefore be better than (2) for SVE (and so no permute
> > would be needed for either endianness on SVE).
> >
> > > +The operation is only supported for vector modes @var{n} and with
> > > +rotations @var{m} of 90 or 270.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > >  @cindex @code{ffs@var{m}2} instruction pattern  @item
> > > @samp{ffs@var{m}2}  Store into operand 0 one plus the index of the
> > > least significant 1-bit diff --git a/gcc/internal-fn.def
> > > b/gcc/internal-fn.def index
> > >
> 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3f
> > > b005b5af78ef31 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb,
> > > binary)  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin,
> binary)
> > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST,
> cadd90,
> > > +binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270,
> ECF_CONST,
> > > +cadd270, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git
> > > a/gcc/optabs.def b/gcc/optabs.def index
> > >
> 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f
> > > 6848e80edf936d 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")  OPTAB_D
> > > (atanh_optab, "atanh$a2")  OPTAB_D (copysign_optab, "copysign$F$a3")
> > > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > +OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D (cadd270_optab,
> > > +"cadd270$a3")
> > >  OPTAB_D (cos_optab, "cos$a2")
> > >  OPTAB_D (cosh_optab, "cosh$a2")
> > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2
> ba
> > > c736dd970be735 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -663,12 +663,94 @@ graceful_exit:
> > >      }
> > >  };
> > >
> > > +class ComplexAddPattern : public ComplexPattern
> >
> > Another nitpick, sorry, but type names should be lower case rather
> > than CamelCase.
> >
> > Thanks,
> > Richard
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@ floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1148df8cf4a509b5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 65044a77d55e24cde6c663e81c11b66ad9650056..0732cf0a6d93be8590b84c39dff82940b280e46b 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -157,6 +157,114 @@ typedef enum _complex_operation : unsigned {
 
 typedef struct map_ { int a; int b; } map_t;
 
+/*******************************************************************************
+ * General helper functions
+ ******************************************************************************/
+
+/* Check to see if all loads rooted in ROOT are linear.  Linearity is
+   defined as having no gaps between values loaded.  */
+
+static load_permutation_t
+linear_loads_p (slp_tree root, bool *linear)
+{
+  *linear = false;
+  if (!root)
+    return vNULL;
+
+  unsigned i;
+  load_permutation_t loads = vNULL;
+
+  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
+    {
+      loads = SLP_TREE_LOAD_PERMUTATION (root);
+      unsigned leader = loads[0];
+      unsigned load;
+      FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
+	if (load != ++leader)
+      return loads;
+    }
+
+  slp_tree child;
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
+    {
+      loads = linear_loads_p (child, linear);
+      if (!*linear)
+	return loads;
+    }
+
+  *linear = true;
+  return loads;
+}
+
+/* This function attempts to make a node rooted in NODE linear.  If the node
+   if already linear than the node itself is returned in RESULT.
+
+   If the node is not linear then a new VEC_PERM_EXPR node is created with a
+   lane permute that when applied will make the node linear.   If such a
+   permute cannot be created then FALSE is returned from the function.
+
+   Here linearity is defined as having a sequential, monotically increasing
+   load position inside the load permute generated by the loads reachable from
+   NODE.  */
+
+static bool
+vect_slp_make_linear (slp_tree parent, slp_tree node, slp_tree *result)
+{
+  bool is_linear = false;
+  unsigned x, val;
+  load_permutation_t load_perm = linear_loads_p (node, &is_linear);
+  if (is_linear)
+    {
+      *result = node;
+      return true;
+    }
+
+  /* Attempt to linearise the permute.  */
+  vec<std::pair<unsigned, unsigned> > zipped;
+  zipped.create (load_perm.length ());
+  FOR_EACH_VEC_ELT (load_perm, x, val)
+    zipped.quick_push (std::make_pair (val, x));
+  typedef const std::pair<unsigned, unsigned>* cmp_t;
+  zipped.qsort ([](const void *a, const void *b) -> int
+    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
+
+  /* Verify if we have a linear permute sequence.  */
+  if (zipped.length () > 0)
+    {
+      unsigned leader = zipped[0].first;
+      for (x = 1; x < zipped.length (); x++)
+	if(!(is_linear = (zipped[x].first == ++leader)))
+	  break;
+    }
+  else
+    is_linear = true;
+
+  if (!is_linear)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			"Loads could not be made linear %p\n",
+			node);
+      zipped.release ();
+      return false;
+  }
+
+  for (x = 0; x < zipped.length (); x++)
+    zipped[x].first = 0;
+
+  /* Create the new permute node and store it instead.  */
+  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
+  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
+  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
+  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
+  SLP_TREE_CHILDREN (vnode).quick_push (node);
+  SLP_TREE_REF_COUNT (vnode) = 1;
+  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
+  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
+  *result = vnode;
+  return is_linear;
+}
+
 /*******************************************************************************
  * Simple vector pattern matcher
  ******************************************************************************/
@@ -532,6 +640,93 @@ complex_pattern::validate_p ()
   return true;
 }
 
+
+/*******************************************************************************
+ * complex_add_pattern class
+ ******************************************************************************/
+
+class complex_add_pattern : public complex_pattern
+{
+  protected:
+    complex_add_pattern (slp_tree *node, vec_info *vinfo)
+      : complex_pattern (node, vinfo)
+    {
+      this->m_arity = 2;
+      this->m_num_args = 2;
+    }
+
+  public:
+    static vect_pattern* create (slp_tree *node, vec_info *vinfo)
+    {
+       return new complex_add_pattern (node, vinfo);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex Addition";
+    }
+
+    bool matches ();
+    bool matches (complex_operation_t op, vec<slp_tree> ops);
+};
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree.
+
+   If no match is found then IFN is set to IFN_LAST.
+   This function matches the patterns shaped as:
+
+   c[i] = a[i] - b[i+1];
+   c[i+1] = a[i+1] + b[i];
+
+   If a match occurred then TRUE is returned, else FALSE.  */
+
+bool
+complex_add_pattern::matches (complex_operation_t op, vec<slp_tree> ops)
+{
+  this->m_ifn = IFN_LAST;
+  this->m_ops.safe_splice (ops);
+
+  /* Find the two components.  Rotation in the complex plane will modify
+     the operations:
+
+      * Rotation  0: + +
+      * Rotation 90: - +
+      * Rotation 180: - -
+      * Rotation 270: + -
+
+      Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+      to care about them here.  */
+  if (op == MINUS_PLUS)
+    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
+  else if (op == PLUS_MINUS)
+    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
+
+  /* verify that there is a permute, otherwise this isn't a pattern we
+     we support.  */
+  bool is_linear = false;
+  bool all_linear = true;
+  unsigned x;
+  for (x = 0; x < this->m_ops.length () && all_linear; x++)
+    {
+      linear_loads_p (this->m_ops[x], &is_linear);
+      all_linear = all_linear && is_linear;
+    }
+
+  if (this->m_ifn == IFN_LAST || all_linear)
+    return false;
+
+  save_match ();
+  return true;
+}
+
+bool
+complex_add_pattern::matches ()
+{
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
+  return matches (op, this->m_ops);
+}
+
 /*******************************************************************************
  * complex_operations_pattern class
  ******************************************************************************/
@@ -580,6 +775,13 @@ complex_operations_pattern::matches ()
   if (op == CMPLX_NONE)
     return false;
 
+  /* Check which pattern this may be.  Match longest pattern first.  */
+  this->m_patt = complex_add_pattern::create (this->m_node, this->m_vinfo);
+  if (this->m_patt->matches (op, this->m_ops))
+    return true;
+
+  delete this->m_patt;
+
   this->m_patt = NULL;
   return false;
 }

RE: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

Reply via email to