int reduction

cvs-commit at gcc dot gnu.org via Gcc-bugs Sat, 18 Oct 2025 15:51:13 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122069


--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:c8dc5d5070c09792bf8d224cac90989885818aaf

commit r16-4477-gc8dc5d5070c09792bf8d224cac90989885818aaf
Author: Tamar Christina <[email protected]>
Date:   Sat Oct 18 08:20:07 2025 +0100

    AArch64: add double widen_sum optab using dotprod for Adv.SIMD [PR122069]

    This patch implements support for using dotproduct to do sum reductions by
    changing += a into += (a * 1).  i.e. we seed the multiplication with 1.

    Given the example

    int foo_int(unsigned char *x, unsigned char * restrict y) {
      int sum = 0;
      for (int i = 0; i < 8000; i++)
         sum += char_abs(x[i] - y[i]);
      return sum;
    }

    we used to generate

    .L2:
            ldr     q0, [x0, x2]
            ldr     q28, [x1, x2]
            sub     v28.16b, v0.16b, v28.16b
            zip1    v29.16b, v28.16b, v31.16b
            zip2    v28.16b, v28.16b, v31.16b
            uaddw   v30.4s, v30.4s, v29.4h
            uaddw2  v30.4s, v30.4s, v29.8h
            uaddw   v30.4s, v30.4s, v28.4h
            uaddw2  v30.4s, v30.4s, v28.8h
            add     x2, x2, 16
            cmp     x2, x3
            bne     .L2
            addv    s31, v30.4s

    but now generates with +dotprod

    .L2:
            ldr     q29, [x0, x2]
            ldr     q28, [x1, x2]
            sub     v28.16b, v29.16b, v28.16b
            udot    v31.4s, v28.16b, v30.16b
            add     x2, x2, 16
            cmp     x2, x3
            bne     .L2
            addv    s31, v31.4s

    gcc/ChangeLog:

            PR middle-end/122069
            * config/aarch64/aarch64-simd.md (widen_ssum<mode><vsi2qi>3): New.
            (widen_usum<mode><vsi2qi>3): New.

    gcc/testsuite/ChangeLog:

            PR middle-end/122069
            * gcc.target/aarch64/pr122069_3.c: New test.
            * gcc.target/aarch64/pr122069_4.c: New test.

[Bug middle-end/122069] Missed use of UDOT for char->int reduction

Reply via email to