Hi All,

This patch series adds support for SLP vectorization of complex instructions 
[1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these 
vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes.  The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the 
operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for:

  * Complex Addition with rotation of 0 and 180.
  * Complex Multiplication and Multiplication where one operand is conjucated.
  * Complex FMA and FMA where one operand is conjucated.
  * Complex FMS and FMS where one operand is conjucated.
  
Complex dot-product is not currently supported in this patch set as build_slp 
fails
for it.  This will be provided as a future patch.
  
These are supported for both integer and floating point and as such these don't 
look
for real or imaginary pairs but instead rely on the early lowering of complex
numbers by GCC and canonicazation of the operations such that it just 
recognizes any
instruction sequence matching the operations requested.

To be safe when the it is not sure it can support the operation or if it finds 
something it
does not understand it backs off.

This patch is an RFC and I am looking on feedback on the approach.  Particularly
this series has one problem which is when it is decided that SLP is not viable
and that the normal loop vectorizer is to be used.

In this case I dissolve the changes but the compiler crashes because the use of
pattern matcher essentially undoes two_operands.  This means that the number of
copies needed when using the patterns and when not are different.  When using
the patterns the two operands become the same and so are treated as manually
unrolled loops.  The problem is that because nunits has already been decided
along with the unroll factor.  When the dissolved statements are then analyzed
they fail.  This is also the reason why I cannot analyze both the pattern and
original statements initially.

The relavent placed in the source code have comments describing the problem.

[1] https://developer.arm.com/documentation/ddi0487/fc/

Thanks,
Tamar

-- 


Reply via email to