https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122069
--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <[email protected]>: https://gcc.gnu.org/g:25c8a8d4318d0fa25d79b4b9f60865da2d6c5e60 commit r16-4482-g25c8a8d4318d0fa25d79b4b9f60865da2d6c5e60 Author: Tamar Christina <[email protected]> Date: Sat Oct 18 08:22:18 2025 +0100 AArch64: Implement widen_[us]sum using [US]ADDW[TB] for SVE2 [PR122069] SVE2 adds [US]ADDW[TB] which we can use when we have to do a single step widening addition. This is useful for instance when the value to be widened does not come from a load. For example for int foo2_int(unsigned short *x, unsigned short * restrict y) { int sum = 0; for (int i = 0; i < 8000; i++) { x[i] = x[i] + y[i]; sum += x[i]; } return sum; } we used to generate .L6: ld1h z1.h, p7/z, [x0, x2, lsl 1] ld1h z29.h, p7/z, [x1, x2, lsl 1] add z29.h, z29.h, z1.h punpklo p6.h, p7.b uunpklo z0.s, z29.h add z31.s, p6/m, z31.s, z0.s punpkhi p6.h, p7.b uunpkhi z30.s, z29.h add z31.s, p6/m, z31.s, z30.s st1h z29.h, p7, [x0, x2, lsl 1] add x2, x2, x4 whilelo p7.h, w2, w3 b.any .L6 ptrue p7.b, all uaddv d31, p7, z31.s but with +sve2 .L12: ld1h z30.h, p7/z, [x0, x2, lsl 1] ld1h z29.h, p7/z, [x1, x2, lsl 1] add z30.h, z30.h, z29.h uaddwb z31.s, z31.s, z30.h uaddwt z31.s, z31.s, z30.h st1h z30.h, p7, [x0, x2, lsl 1] mov x3, x2 inch x2 cmp w2, w4 bls .L12 inch x3 uaddv d31, p7, z31.s gcc/ChangeLog: PR middle-end/122069 * config/aarch64/aarch64-sve2.md: (widen_ssum<mode><Vnarrow>3): New. (widen_usum<mode><Vnarrow>3): New. * config/aarch64/iterators.md (Vnarrow): New, to match VNARROW. gcc/testsuite/ChangeLog: PR middle-end/122069 * gcc.target/aarch64/sve2/pr122069_1.c: New test. * gcc.target/aarch64/sve2/pr122069_2.c: New test.
