https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122069

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:75fb400d2950e1f743f133ece8fb3abe815faf13

commit r16-4483-g75fb400d2950e1f743f133ece8fb3abe815faf13
Author: Tamar Christina <[email protected]>
Date:   Sat Oct 18 08:22:50 2025 +0100

    AArch64: Implement widen_[us]sum using 2-way [US]UDOT for SVE2p1 [PR122069]

    SVE2p1 adds 2-way dotproduct which we can use when we have to do a single
step
    widening addition.  This is useful for instance when the value to be
widened
    does not come from a load.  For example for

    int foo2_int(unsigned short *x, unsigned short * restrict y) {
      int sum = 0;
      for (int i = 0; i < 8000; i++)
        {
          x[i] = x[i] + y[i];
          sum += x[i];
        }
      return sum;
    }

    we used to generate

    .L12:
            ld1h    z30.h, p7/z, [x0, x2, lsl 1]
            ld1h    z29.h, p7/z, [x1, x2, lsl 1]
            add     z30.h, z30.h, z29.h
            uaddwb  z31.s, z31.s, z30.h
            uaddwt  z31.s, z31.s, z30.h
            st1h    z30.h, p7, [x0, x2, lsl 1]
            mov     x3, x2
            inch    x2
            cmp     w2, w4
            bls     .L12
            inch    x3
            uaddv   d31, p7, z31.s

    but with +sve2p1

    .L12:
            ld1h    z31.h, p7/z, [x0, x2, lsl 1]
            ld1h    z29.h, p7/z, [x1, x2, lsl 1]
            add     z31.h, z31.h, z29.h
            udot    z30.s, z31.h, z28.h
            st1h    z31.h, p7, [x0, x2, lsl 1]
            mov     x3, x2
            inch    x2
            cmp     w2, w4
            bls     .L12
            inch    x3
            uaddv   d30, p7, z30.s

    gcc/ChangeLog:

            PR middle-end/122069
            * config/aarch64/aarch64-sve2.md
            (widen_ssum<mode><Vnarrow>3): Update.
            (widen_usum<mode><Vnarrow>3): Update.

    gcc/testsuite/ChangeLog:

            PR middle-end/122069
            * gcc.target/aarch64/sve2/pr122069_3.c: New test.
            * gcc.target/aarch64/sve2/pr122069_4.c: New test.

Reply via email to