https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <[email protected]>: https://gcc.gnu.org/g:cb40e813b8f09f9d3a6000901f1373b476a20886 commit r16-7057-gcb40e813b8f09f9d3a6000901f1373b476a20886 Author: Tamar Christina <[email protected]> Date: Tue Jan 27 09:12:16 2026 +0000 middle-end: teach convert_mult_to_fma handle casts between addend and multiplicant [PR122749] The following example int foo2 (char *buf, int len) { int x; for (int i =0; i < len; i++) { x += (int) i * buf[i]; } return x; } compiled with -O3 -mcpu=neoverse-v2 used to generate a 4x unrolled MLA sequence mla z29.s, p7/m, z2.s, z0.s mla z27.s, p7/m, z4.s, z26.s mla z30.s, p7/m, z1.s, z0.s mla z28.s, p7/m, z23.s, z3.s but now generates MUL + ADD mul z2.s, z2.s, z1.s mul z4.s, z4.s, z26.s mul z1.s, z24.s, z1.s mul z3.s, z23.s, z3.s add z29.s, z2.s, z29.s add z30.s, z1.s, z30.s add z28.s, z3.s, z28.s add z0.s, z4.s, z0.s This is since the fix for r16-3328-g3182e95eda4 we now insert casts around the reduction addend. This causes convert_mult_to_fma to miss the mul + add sequence. This patch teaches it to look around the casts for the operands and only accept the conversions if it's essentially only a sign changing operations. Concretely, it converts: # vect_vec_iv_.13_49 = PHI <_50(5), { 0, 1, 2, ... }(4)> vect__3.8_38 = MEM <vector([4,4]) char> [(char *)_16]; vect__4.12_45 = (vector([4,4]) int) vect__3.8_38; vect__5.14_54 = vect__4.12_45 * vect_vec_iv_.13_49; vect_x_12.17_62 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned int>(vect__5.14_54); vect_x_12.17_63 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned int>(vect_x_16.15_58); vect_x_12.17_64 = vect_x_12.17_62 + vect_x_12.17_63; vect_x_12.16_65 = VIEW_CONVERT_EXPR<vector([4,4]) int>(vect_x_12.17_64); into: # vect_vec_iv_.13_49 = PHI <_50(5), { 0, 1, 2, ... }(4)> vect__3.8_38 = MEM <vector([4,4]) charD.8> [(charD.8 *)_16]; vect__4.12_45 = (vector([4,4]) intD.7) vect__3.8_38; vect_x_12.17_63 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned int>(vect_x_16.15_58); _2 = (vector([4,4]) unsigned int) vect_vec_iv_.13_49; _1 = (vector([4,4]) unsigned int) vect__4.12_45; vect_x_12.17_64 = .FMA (_1, _2, vect_x_12.17_63); vect_x_12.16_65 = VIEW_CONVERT_EXPR<vector([4,4]) intD.7>(vect_x_12.17_64); thus restoring FMAs on reductions. gcc/ChangeLog: PR tree-optimization/122749 * tree-ssa-math-opts.cc (convert_mult_to_fma_1, convert_mult_to_fma): Unwrap converts around addend. gcc/testsuite/ChangeLog: PR tree-optimization/122749 * gcc.target/aarch64/pr122749_1.c: New test. * gcc.target/aarch64/pr122749_2.c: New test. * gcc.target/aarch64/pr122749_3.c: New test. * gcc.target/aarch64/pr122749_4.c: New test. * gcc.target/aarch64/pr122749_5.c: New test. * gcc.target/aarch64/pr122749_6.c: New test. * gcc.target/aarch64/pr122749_8.c: New test. * gcc.target/aarch64/pr122749_9.c: New test. * gcc.target/aarch64/sve/pr122749_1.c: New test. * gcc.target/aarch64/sve/pr122749_11.c: New test. * gcc.target/aarch64/sve/pr122749_12.c: New test. * gcc.target/aarch64/sve/pr122749_13.c: New test. * gcc.target/aarch64/sve/pr122749_14.c: New test. * gcc.target/aarch64/sve/pr122749_2.c: New test. * gcc.target/aarch64/sve/pr122749_3.c: New test. * gcc.target/aarch64/sve/pr122749_4.c: New test. * gcc.target/aarch64/sve/pr122749_5.c: New test. * gcc.target/aarch64/sve/pr122749_6.c: New test. * gcc.target/aarch64/sve/pr122749_8.c: New test. * gcc.target/aarch64/sve/pr122749_9.c: New test.
