Hi All, This patch series adds support for SLP vectorization of complex instructions [1].
These instructions exist only in their vector forms and require you to recognize two statements in parallel. Complex operations usually require a permute due to the fact that the real and imaginary numbers are stored intermixed but these vector instructions expect this and no longer need the compiler to generate a permute. For this reason the pass also re-orders the loads in the SLP tree such that they become contiguous and no longer need the permutes. The Basic Blocks are left untouched such that the scalar loop will still correctly issue permutes. The instructions also support rotations along the Argand plane, as such the operands have to be re-ordered to coincide with their load group. For now, this patch only adds support for Complex addition with rotate and Complex FMLA with rotation of 0 and 180. However it is the intention to in the future add support for Complex subtraction and Complex multiplication. The operations rely on the early lowering of complex numbers by GCC into real and imaginary pairs, and so just recognizes any instruction sequence matching the operations requested. To be safe when the it is not sure it can support the operation or if it finds something it does not understand it backs off. The hit rate of such patterns in SPEC CPU 2006 are as follows Unsupported due to type casts: 28 Successfully matched and created: 43 Aborted due to unknown instruction in sequence: 354 Total times pattern matched: 403 Which shows that this and the future enhancements are worth while. On AArch64 the code size difference when the new instructions are used is about 2-3x smaller. [1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Thanks, Tamar --