On Tue, 29 Sep 2020, Richard Sandiford wrote:

> Tamar Christina <tamar.christ...@arm.com> writes:
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 
> > 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be
> >  100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6132,6 +6132,17 @@ floating-point mode.
> >  
> >  This pattern is not allowed to @code{FAIL}.
> >  
> > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > +@item @samp{cadd@var{m}@var{n}3}
> > +Perform a vector addition of complex numbers in operand 1 with operand 2
> > +rotated by @var{m} degrees around the argand plane and storing the result 
> > in
> > +operand 0.  The instruction must perform the operation on data loaded
> > +contiguously into the vectors.
> 
> Nitpicking, sorry, but I think it would be better to describe the
> layout directly rather than in terms of loads, since the preceding
> operation might not be a load.

So if we're at that and since GCC vectors do not have complex
components can we formulate this in terms avoiding 'complex'?
Isn't this an add of one vector to a vector with adjacant
lanes swapped and possibly negated?  Mentioning that this would
match a complex add in case lanes happen to match up with
complex real/imag parts is OK but the pattern should work
equally well if there's no complex numbers involved?

> I guess the main question is: what representation do we expect for
> big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> 
>              MEMORY
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   array numbering
>    +-----+-----+-----+-----+
>       V     V     V     V      Advanced SIMD LDR
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   GCC lane numbering
>    +-----+-----+-----+-----+
>    |  3  |  2  |  1  |  0  |   Arm lane numbering
>    +-----+-----+-----+-----+
>   MSB       REGISTER      LSB
> 
> but the FC* instructions put the imaginary parts in the more
> significant lane, so the pairs of elements above would need
> to be reversed:
> 
>              MEMORY
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   array numbering
>    +-----+-----+-----+-----+
>        \   /       \   /
>         \ /         \ /
>          X           X         Load and permute
>         / \         / \
>        /   \       /   \
>    +-----+-----+-----+-----+
>    | i0  | r0  | i1  | r1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   GCC lane numbering
>    +-----+-----+-----+-----+
>    |  3  |  2  |  1  |  0  |   Arm lane numbering
>    +-----+-----+-----+-----+
>   MSB       REGISTER      LSB
> 
> (Or the whole vector could be reversed.)
> 
> We might decide that it just isn't worth doing this for Advanced SIMD.
> But should the semantics of the optab be that:
> 
> (1) GCC lane number 0 holds a real part, or
> (2) the least significant lane holds a real part?
> 
> With (1), it would be up to the target to hide the permute above.
> With (2), the vectoriser would need to introduce the permute itself.
> 
> I'm not sure there's a perfect answer even for Arm targets.  (2) matches
> the Advanced SIMD semantics.  But for SVE, the register layout follows
> LD1 rather than LDR, and the GCC and architectural lane numbering match up.
> (1) would therefore be better than (2) for SVE (and so no permute would be
> needed for either endianness on SVE).
> 
> > +The operation is only supported for vector modes @var{n} and with
> > +rotations @var{m} of 90 or 270.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{ffs@var{m}2} instruction pattern
> >  @item @samp{ffs@var{m}2}
> >  Store into operand 0 one plus the index of the least significant 1-bit
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 
> > 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31
> >  100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> >  
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 
> > 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d
> >  100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
> >  OPTAB_D (atanh_optab, "atanh$a2")
> >  OPTAB_D (copysign_optab, "copysign$F$a3")
> >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index 
> > 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735
> >  100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -663,12 +663,94 @@ graceful_exit:
> >      }
> >  };
> >  
> > +class ComplexAddPattern : public ComplexPattern
> 
> Another nitpick, sorry, but type names should be lower case rather than
> CamelCase.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to