Hi All, here is a respin with the requested changes.
I just realized I haven't updated the documentation yet but will do so soon since I'm sure there will be feedback :) Thanks, Tamar gcc/ChangeLog: * doc/md.texi: Document optabs. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * tree-vect-slp-patterns.c (linear_loads_p, vect_slp_make_linear, class complex_add_pattern,complex_add_pattern::matches): New. (complex_operations_pattern::matches): Add complex_add_pattern. > -----Original Message----- > From: rguent...@c653.arch.suse.de <rguent...@c653.arch.suse.de> On > Behalf Of Richard Biener > Sent: Tuesday, September 29, 2020 11:44 AM > To: Richard Sandiford <richard.sandif...@arm.com> > Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; > nd <n...@arm.com>; o...@ucw.cz > Subject: Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation > detection > > On Tue, 29 Sep 2020, Richard Sandiford wrote: > > > Tamar Christina <tamar.christ...@arm.com> writes: > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > > > > 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a > 4e > > > bbed73a70f29be 100644 > > > --- a/gcc/doc/md.texi > > > +++ b/gcc/doc/md.texi > > > @@ -6132,6 +6132,17 @@ floating-point mode. > > > > > > This pattern is not allowed to @code{FAIL}. > > > > > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern @item > > > +@samp{cadd@var{m}@var{n}3} Perform a vector addition of complex > > > +numbers in operand 1 with operand 2 rotated by @var{m} degrees > > > +around the argand plane and storing the result in operand 0. The > > > +instruction must perform the operation on data loaded contiguously > > > +into the vectors. > > > > Nitpicking, sorry, but I think it would be better to describe the > > layout directly rather than in terms of loads, since the preceding > > operation might not be a load. > > So if we're at that and since GCC vectors do not have complex components > can we formulate this in terms avoiding 'complex'? > Isn't this an add of one vector to a vector with adjacant lanes swapped and > possibly negated? Mentioning that this would match a complex add in case > lanes happen to match up with complex real/imag parts is OK but the pattern > should work equally well if there's no complex numbers involved? > > > I guess the main question is: what representation do we expect for > > big-endian? A normal Advanced SIMD LDR would give this (for floats): > > > > MEMORY > > +-----+-----+-----+-----+ > > | r0 | i0 | r1 | i1 | > > +-----+-----+-----+-----+ > > | 0 | 1 | 2 | 3 | array numbering > > +-----+-----+-----+-----+ > > V V V V Advanced SIMD LDR > > +-----+-----+-----+-----+ > > | r0 | i0 | r1 | i1 | > > +-----+-----+-----+-----+ > > | 0 | 1 | 2 | 3 | GCC lane numbering > > +-----+-----+-----+-----+ > > | 3 | 2 | 1 | 0 | Arm lane numbering > > +-----+-----+-----+-----+ > > MSB REGISTER LSB > > > > but the FC* instructions put the imaginary parts in the more > > significant lane, so the pairs of elements above would need to be > > reversed: > > > > MEMORY > > +-----+-----+-----+-----+ > > | r0 | i0 | r1 | i1 | > > +-----+-----+-----+-----+ > > | 0 | 1 | 2 | 3 | array numbering > > +-----+-----+-----+-----+ > > \ / \ / > > \ / \ / > > X X Load and permute > > / \ / \ > > / \ / \ > > +-----+-----+-----+-----+ > > | i0 | r0 | i1 | r1 | > > +-----+-----+-----+-----+ > > | 0 | 1 | 2 | 3 | GCC lane numbering > > +-----+-----+-----+-----+ > > | 3 | 2 | 1 | 0 | Arm lane numbering > > +-----+-----+-----+-----+ > > MSB REGISTER LSB > > > > (Or the whole vector could be reversed.) > > > > We might decide that it just isn't worth doing this for Advanced SIMD. > > But should the semantics of the optab be that: > > > > (1) GCC lane number 0 holds a real part, or > > (2) the least significant lane holds a real part? > > > > With (1), it would be up to the target to hide the permute above. > > With (2), the vectoriser would need to introduce the permute itself. > > > > I'm not sure there's a perfect answer even for Arm targets. (2) > > matches the Advanced SIMD semantics. But for SVE, the register layout > > follows > > LD1 rather than LDR, and the GCC and architectural lane numbering match > up. > > (1) would therefore be better than (2) for SVE (and so no permute > > would be needed for either endianness on SVE). > > > > > +The operation is only supported for vector modes @var{n} and with > > > +rotations @var{m} of 90 or 270. > > > + > > > +This pattern is not allowed to @code{FAIL}. > > > + > > > @cindex @code{ffs@var{m}2} instruction pattern @item > > > @samp{ffs@var{m}2} Store into operand 0 one plus the index of the > > > least significant 1-bit diff --git a/gcc/internal-fn.def > > > b/gcc/internal-fn.def index > > > > 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3f > > > b005b5af78ef31 100644 > > > --- a/gcc/internal-fn.def > > > +++ b/gcc/internal-fn.def > > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, > > > binary) DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, > binary) > > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) > > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, > cadd90, > > > +binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, > ECF_CONST, > > > +cadd270, binary) > > > > > > /* FP scales. */ > > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git > > > a/gcc/optabs.def b/gcc/optabs.def index > > > > 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f > > > 6848e80edf936d 100644 > > > --- a/gcc/optabs.def > > > +++ b/gcc/optabs.def > > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") OPTAB_D > > > (atanh_optab, "atanh$a2") OPTAB_D (copysign_optab, "copysign$F$a3") > > > OPTAB_D (xorsign_optab, "xorsign$F$a3") > > > +OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D (cadd270_optab, > > > +"cadd270$a3") > > > OPTAB_D (cos_optab, "cos$a2") > > > OPTAB_D (cosh_optab, "cosh$a2") > > > OPTAB_D (exp10_optab, "exp10$a2") > > > diff --git a/gcc/tree-vect-slp-patterns.c > > > b/gcc/tree-vect-slp-patterns.c index > > > > 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2 > ba > > > c736dd970be735 100644 > > > --- a/gcc/tree-vect-slp-patterns.c > > > +++ b/gcc/tree-vect-slp-patterns.c > > > @@ -663,12 +663,94 @@ graceful_exit: > > > } > > > }; > > > > > > +class ComplexAddPattern : public ComplexPattern > > > > Another nitpick, sorry, but type names should be lower case rather > > than CamelCase. > > > > Thanks, > > Richard > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, Germany; GF: Felix Imend
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6132,6 +6132,17 @@ floating-point mode. This pattern is not allowed to @code{FAIL}. +@cindex @code{cadd@var{m}@var{n}3} instruction pattern +@item @samp{cadd@var{m}@var{n}3} +Perform a vector addition of complex numbers in operand 1 with operand 2 +rotated by @var{m} degrees around the argand plane and storing the result in +operand 0. The instruction must perform the operation on data loaded +contiguously into the vectors. +The operation is only supported for vector modes @var{n} and with +rotations @var{m} of 90 or 270. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{ffs@var{m}2} instruction pattern @item @samp{ffs@var{m}2} Store into operand 0 one plus the index of the least significant 1-bit diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1148df8cf4a509b5 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary) DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary) DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) + /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") OPTAB_D (atanh_optab, "atanh$a2") OPTAB_D (copysign_optab, "copysign$F$a3") OPTAB_D (xorsign_optab, "xorsign$F$a3") +OPTAB_D (cadd90_optab, "cadd90$a3") +OPTAB_D (cadd270_optab, "cadd270$a3") OPTAB_D (cos_optab, "cos$a2") OPTAB_D (cosh_optab, "cosh$a2") OPTAB_D (exp10_optab, "exp10$a2") diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index 65044a77d55e24cde6c663e81c11b66ad9650056..0732cf0a6d93be8590b84c39dff82940b280e46b 100644 --- a/gcc/tree-vect-slp-patterns.c +++ b/gcc/tree-vect-slp-patterns.c @@ -157,6 +157,114 @@ typedef enum _complex_operation : unsigned { typedef struct map_ { int a; int b; } map_t; +/******************************************************************************* + * General helper functions + ******************************************************************************/ + +/* Check to see if all loads rooted in ROOT are linear. Linearity is + defined as having no gaps between values loaded. */ + +static load_permutation_t +linear_loads_p (slp_tree root, bool *linear) +{ + *linear = false; + if (!root) + return vNULL; + + unsigned i; + load_permutation_t loads = vNULL; + + if (SLP_TREE_LOAD_PERMUTATION (root).exists ()) + { + loads = SLP_TREE_LOAD_PERMUTATION (root); + unsigned leader = loads[0]; + unsigned load; + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1) + if (load != ++leader) + return loads; + } + + slp_tree child; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) + { + loads = linear_loads_p (child, linear); + if (!*linear) + return loads; + } + + *linear = true; + return loads; +} + +/* This function attempts to make a node rooted in NODE linear. If the node + if already linear than the node itself is returned in RESULT. + + If the node is not linear then a new VEC_PERM_EXPR node is created with a + lane permute that when applied will make the node linear. If such a + permute cannot be created then FALSE is returned from the function. + + Here linearity is defined as having a sequential, monotically increasing + load position inside the load permute generated by the loads reachable from + NODE. */ + +static bool +vect_slp_make_linear (slp_tree parent, slp_tree node, slp_tree *result) +{ + bool is_linear = false; + unsigned x, val; + load_permutation_t load_perm = linear_loads_p (node, &is_linear); + if (is_linear) + { + *result = node; + return true; + } + + /* Attempt to linearise the permute. */ + vec<std::pair<unsigned, unsigned> > zipped; + zipped.create (load_perm.length ()); + FOR_EACH_VEC_ELT (load_perm, x, val) + zipped.quick_push (std::make_pair (val, x)); + typedef const std::pair<unsigned, unsigned>* cmp_t; + zipped.qsort ([](const void *a, const void *b) -> int + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; }); + + /* Verify if we have a linear permute sequence. */ + if (zipped.length () > 0) + { + unsigned leader = zipped[0].first; + for (x = 1; x < zipped.length (); x++) + if(!(is_linear = (zipped[x].first == ++leader))) + break; + } + else + is_linear = true; + + if (!is_linear) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Loads could not be made linear %p\n", + node); + zipped.release (); + return false; + } + + for (x = 0; x < zipped.length (); x++) + zipped[x].first = 0; + + /* Create the new permute node and store it instead. */ + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; + SLP_TREE_LANE_PERMUTATION (vnode) = zipped; + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); + SLP_TREE_CHILDREN (vnode).quick_push (node); + SLP_TREE_REF_COUNT (vnode) = 1; + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node); + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent); + *result = vnode; + return is_linear; +} + /******************************************************************************* * Simple vector pattern matcher ******************************************************************************/ @@ -532,6 +640,93 @@ complex_pattern::validate_p () return true; } + +/******************************************************************************* + * complex_add_pattern class + ******************************************************************************/ + +class complex_add_pattern : public complex_pattern +{ + protected: + complex_add_pattern (slp_tree *node, vec_info *vinfo) + : complex_pattern (node, vinfo) + { + this->m_arity = 2; + this->m_num_args = 2; + } + + public: + static vect_pattern* create (slp_tree *node, vec_info *vinfo) + { + return new complex_add_pattern (node, vinfo); + } + + const char* get_name () + { + return "Complex Addition"; + } + + bool matches (); + bool matches (complex_operation_t op, vec<slp_tree> ops); +}; + +/* Pattern matcher for trying to match complex addition pattern in SLP tree. + + If no match is found then IFN is set to IFN_LAST. + This function matches the patterns shaped as: + + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + + If a match occurred then TRUE is returned, else FALSE. */ + +bool +complex_add_pattern::matches (complex_operation_t op, vec<slp_tree> ops) +{ + this->m_ifn = IFN_LAST; + this->m_ops.safe_splice (ops); + + /* Find the two components. Rotation in the complex plane will modify + the operations: + + * Rotation 0: + + + * Rotation 90: - + + * Rotation 180: - - + * Rotation 270: + - + + Rotation 0 and 180 can be handled by normal SIMD code, so we don't need + to care about them here. */ + if (op == MINUS_PLUS) + this->m_ifn = IFN_COMPLEX_ADD_ROT90; + else if (op == PLUS_MINUS) + this->m_ifn = IFN_COMPLEX_ADD_ROT270; + + /* verify that there is a permute, otherwise this isn't a pattern we + we support. */ + bool is_linear = false; + bool all_linear = true; + unsigned x; + for (x = 0; x < this->m_ops.length () && all_linear; x++) + { + linear_loads_p (this->m_ops[x], &is_linear); + all_linear = all_linear && is_linear; + } + + if (this->m_ifn == IFN_LAST || all_linear) + return false; + + save_match (); + return true; +} + +bool +complex_add_pattern::matches () +{ + complex_operation_t op + = vect_detect_pair_op (*this->m_node, true, &this->m_ops); + return matches (op, this->m_ops); +} + /******************************************************************************* * complex_operations_pattern class ******************************************************************************/ @@ -580,6 +775,13 @@ complex_operations_pattern::matches () if (op == CMPLX_NONE) return false; + /* Check which pattern this may be. Match longest pattern first. */ + this->m_patt = complex_add_pattern::create (this->m_node, this->m_vinfo); + if (this->m_patt->matches (op, this->m_ops)) + return true; + + delete this->m_patt; + this->m_patt = NULL; return false; }