On 10/31/22 05:56, Tamar Christina wrote:
Hi All,

This patch series is to add recognition of pairwise operations (reductions)
in match.pd such that we can benefit from them even at -O1 when the vectorizer
isn't enabled.

Ths use of these allow for a lot simpler codegen in AArch64 and allows us to
avoid quite a lot of codegen warts.

As an example a simple:

typedef float v4sf __attribute__((vector_size (16)));

float
foo3 (v4sf x)
{
   return x[1] + x[2];
}

currently generates:

foo3:
         dup     s1, v0.s[1]
         dup     s0, v0.s[2]
         fadd    s0, s1, s0
         ret

while with this patch series now generates:

foo3:
        ext     v0.16b, v0.16b, v0.16b, #4
        faddp   s0, v0.2s
        ret

This patch will not perform the operation if the source is not a gimple
register and leaves memory sources to the vectorizer as it's able to deal
correctly with clobbers.

The use of these instruction makes a significant difference in codegen quality
for AArch64 and Arm.

NOTE: The last entry in the series contains tests for all of the previous
patches as it's a bit of an all or nothing thing.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        * match.pd (adjacent_data_access_p): Import.
        Add new pattern for bitwise plus, min, max, fmax, fmin.
        * tree-cfg.cc (verify_gimple_call): Allow function arguments in IFNs.
        * tree.cc (adjacent_data_access_p): New.
        * tree.h (adjacent_data_access_p): New.

Nice stuff.  I'd pondered some similar stuff at Tachyum, but got dragged away before it could be implemented.





diff --git a/gcc/tree.cc b/gcc/tree.cc
index 
007c9325b17076f474e6681c49966c59cf6b91c7..5315af38a1ead89ca5f75dc4b19de9841e29d311
 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10457,6 +10457,90 @@ bitmask_inv_cst_vector_p (tree t)
    return builder.build ();
  }
+/* Returns base address if the two operands represent adjacent access of data
+   such that a pairwise operation can be used.  OP1 must be a lower subpart
+   than OP2.  If POS is not NULL then on return if a value is returned POS
+   will indicate the position of the lower address.  If COMMUTATIVE_P then
+   the operation is also tried by flipping op1 and op2.  */
+
+tree adjacent_data_access_p (tree op1, tree op2, poly_uint64 *pos,
+                            bool commutative_p)

Formatting nit.  Return type on a different line.


OK with that fixed.


jeff


Reply via email to