Hi All, This patch series adds support for SLP vectorization of complex instructions [1].
These instructions exist only in their vector forms and require you to recognize two statements in parallel. Complex operations usually require a permute due to the fact that the real and imaginary numbers are stored intermixed but these vector instructions expect this and no longer need the compiler to generate a permute. For this reason the pass also re-orders the loads in the SLP tree such that they become contiguous and no longer need the permutes. The Basic Blocks are left untouched such that the scalar loop will still correctly issue permutes. The instructions also support rotations along the Argand plane, as such the operands have to be re-ordered to coincide with their load group. For now, this patch only adds support for: * Complex Addition with rotation of 0 and 180. * Complex Multiplication and Multiplication where one operand is conjucated. * Complex FMA and FMA where one operand is conjucated. * Complex FMS and FMS where one operand is conjucated. Complex dot-product is not currently supported in this patch set as build_slp fails for it. This will be provided as a future patch. These are supported for both integer and floating point and as such these don't look for real or imaginary pairs but instead rely on the early lowering of complex numbers by GCC and canonicazation of the operations such that it just recognizes any instruction sequence matching the operations requested. To be safe when the it is not sure it can support the operation or if it finds something it does not understand it backs off. This patch is an RFC and I am looking on feedback on the approach. Particularly this series has one problem which is when it is decided that SLP is not viable and that the normal loop vectorizer is to be used. In this case I dissolve the changes but the compiler crashes because the use of pattern matcher essentially undoes two_operands. This means that the number of copies needed when using the patterns and when not are different. When using the patterns the two operands become the same and so are treated as manually unrolled loops. The problem is that because nunits has already been decided along with the unroll factor. When the dissolved statements are then analyzed they fail. This is also the reason why I cannot analyze both the pattern and original statements initially. The relavent placed in the source code have comments describing the problem. [1] https://developer.arm.com/documentation/ddi0487/fc/ Thanks, Tamar --