https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122069

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:25c8a8d4318d0fa25d79b4b9f60865da2d6c5e60

commit r16-4482-g25c8a8d4318d0fa25d79b4b9f60865da2d6c5e60
Author: Tamar Christina <[email protected]>
Date:   Sat Oct 18 08:22:18 2025 +0100

    AArch64: Implement widen_[us]sum using [US]ADDW[TB] for SVE2 [PR122069]

    SVE2 adds [US]ADDW[TB] which we can use when we have to do a single step
    widening addition.  This is useful for instance when the value to be
widened
    does not come from a load.  For example for

    int foo2_int(unsigned short *x, unsigned short * restrict y) {
      int sum = 0;
      for (int i = 0; i < 8000; i++)
        {
          x[i] = x[i] + y[i];
          sum += x[i];
        }
      return sum;
    }

    we used to generate

    .L6:
            ld1h    z1.h, p7/z, [x0, x2, lsl 1]
            ld1h    z29.h, p7/z, [x1, x2, lsl 1]
            add     z29.h, z29.h, z1.h
            punpklo p6.h, p7.b
            uunpklo z0.s, z29.h
            add     z31.s, p6/m, z31.s, z0.s
            punpkhi p6.h, p7.b
            uunpkhi z30.s, z29.h
            add     z31.s, p6/m, z31.s, z30.s
            st1h    z29.h, p7, [x0, x2, lsl 1]
            add     x2, x2, x4
            whilelo p7.h, w2, w3
            b.any   .L6
            ptrue   p7.b, all
            uaddv   d31, p7, z31.s

    but with +sve2

    .L12:
            ld1h    z30.h, p7/z, [x0, x2, lsl 1]
            ld1h    z29.h, p7/z, [x1, x2, lsl 1]
            add     z30.h, z30.h, z29.h
            uaddwb  z31.s, z31.s, z30.h
            uaddwt  z31.s, z31.s, z30.h
            st1h    z30.h, p7, [x0, x2, lsl 1]
            mov     x3, x2
            inch    x2
            cmp     w2, w4
            bls     .L12
            inch    x3
            uaddv   d31, p7, z31.s

    gcc/ChangeLog:

            PR middle-end/122069
            * config/aarch64/aarch64-sve2.md: (widen_ssum<mode><Vnarrow>3):
New.
            (widen_usum<mode><Vnarrow>3): New.
            * config/aarch64/iterators.md (Vnarrow): New, to match VNARROW.

    gcc/testsuite/ChangeLog:

            PR middle-end/122069
            * gcc.target/aarch64/sve2/pr122069_1.c: New test.
            * gcc.target/aarch64/sve2/pr122069_2.c: New test.

Reply via email to