https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122069
--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <[email protected]>: https://gcc.gnu.org/g:2f719014bfec1d21eceb409db6eff78eb92942f3 commit r16-4481-g2f719014bfec1d21eceb409db6eff78eb92942f3 Author: Tamar Christina <[email protected]> Date: Sat Oct 18 08:21:56 2025 +0100 AArch64: Implement widen_[us]sum using dotproduct for SVE [PR122069] This patch implements support for using dotproduct to do sum reductions by changing += a into += (a * 1). i.e. we seed the multiplication with 1. Given the example int foo_int(unsigned char *x, unsigned char * restrict y) { int sum = 0; for (int i = 0; i < 8000; i++) sum += char_abs(x[i] - y[i]); return sum; } we used to generate .L2: ld1b z1.b, p7/z, [x0, x2] ld1b z29.b, p7/z, [x1, x2] sub z29.b, z1.b, z29.b uunpklo z0.h, z29.b uunpkhi z29.h, z29.b uunpklo z30.s, z0.h add z31.s, p6/m, z31.s, z30.s uunpkhi z0.s, z0.h add z31.s, p5/m, z31.s, z0.s uunpklo z28.s, z29.h add z31.s, p4/m, z31.s, z28.s uunpkhi z29.s, z29.h add z31.s, p3/m, z31.s, z29.s add x2, x2, x7 whilelo p7.b, w2, w3 whilelo p3.s, w2, w6 whilelo p4.s, w2, w5 whilelo p5.s, w2, w4 whilelo p6.s, w2, w3 b.any .L2 ptrue p7.b, all uaddv d31, p7, z31.s but now generates with +dotprod .L3: ld1b z30.b, p7/z, [x5, x2] ld1b z29.b, p7/z, [x1, x2] sub z30.b, z30.b, z29.b udot z31.s, z30.b, z28.b mov x3, x2 add x2, x2, x6 cmp w2, w0 bls .L3 incb x3 uaddv d31, p7, z31.s gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-sve.md (widen_<sur>sum<mode><vsi2qi>3): New. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/sve/pr122069_1.c: New test. * gcc.target/aarch64/sve/pr122069_2.c: New test.
