https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:cb40e813b8f09f9d3a6000901f1373b476a20886

commit r16-7057-gcb40e813b8f09f9d3a6000901f1373b476a20886
Author: Tamar Christina <[email protected]>
Date:   Tue Jan 27 09:12:16 2026 +0000

    middle-end: teach convert_mult_to_fma handle casts between addend and
multiplicant [PR122749]

    The following example

    int foo2 (char *buf, int len) {
        int x;
        for (int i =0; i < len; i++) {
            x += (int) i * buf[i];
        }
        return x;
    }

    compiled with -O3 -mcpu=neoverse-v2 used to generate a 4x unrolled MLA
sequence

            mla     z29.s, p7/m, z2.s, z0.s
            mla     z27.s, p7/m, z4.s, z26.s
            mla     z30.s, p7/m, z1.s, z0.s
            mla     z28.s, p7/m, z23.s, z3.s

    but now generates MUL + ADD

            mul     z2.s, z2.s, z1.s
            mul     z4.s, z4.s, z26.s
            mul     z1.s, z24.s, z1.s
            mul     z3.s, z23.s, z3.s
            add     z29.s, z2.s, z29.s
            add     z30.s, z1.s, z30.s
            add     z28.s, z3.s, z28.s
            add     z0.s, z4.s, z0.s

    This is since the fix for r16-3328-g3182e95eda4 we now insert casts around
the
    reduction addend.  This causes convert_mult_to_fma to miss the mul + add
    sequence.

    This patch teaches it to look around the casts for the operands and only
accept
    the conversions if it's essentially only a sign changing operations.

    Concretely, it converts:

      # vect_vec_iv_.13_49 = PHI <_50(5), { 0, 1, 2, ... }(4)>
      vect__3.8_38 = MEM <vector([4,4]) char> [(char *)_16];
      vect__4.12_45 = (vector([4,4]) int) vect__3.8_38;
      vect__5.14_54 = vect__4.12_45 * vect_vec_iv_.13_49;
      vect_x_12.17_62 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned
int>(vect__5.14_54);
      vect_x_12.17_63 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned
int>(vect_x_16.15_58);
      vect_x_12.17_64 = vect_x_12.17_62 + vect_x_12.17_63;
      vect_x_12.16_65 = VIEW_CONVERT_EXPR<vector([4,4]) int>(vect_x_12.17_64);

    into:

      # vect_vec_iv_.13_49 = PHI <_50(5), { 0, 1, 2, ... }(4)>
      vect__3.8_38 = MEM <vector([4,4]) charD.8> [(charD.8 *)_16];
      vect__4.12_45 = (vector([4,4]) intD.7) vect__3.8_38;
      vect_x_12.17_63 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned
int>(vect_x_16.15_58);
      _2 = (vector([4,4]) unsigned int) vect_vec_iv_.13_49;
      _1 = (vector([4,4]) unsigned int) vect__4.12_45;
      vect_x_12.17_64 = .FMA (_1, _2, vect_x_12.17_63);
      vect_x_12.16_65 = VIEW_CONVERT_EXPR<vector([4,4])
intD.7>(vect_x_12.17_64);

    thus restoring FMAs on reductions.

    gcc/ChangeLog:

            PR tree-optimization/122749
            * tree-ssa-math-opts.cc (convert_mult_to_fma_1,
convert_mult_to_fma):
            Unwrap converts around addend.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/122749
            * gcc.target/aarch64/pr122749_1.c: New test.
            * gcc.target/aarch64/pr122749_2.c: New test.
            * gcc.target/aarch64/pr122749_3.c: New test.
            * gcc.target/aarch64/pr122749_4.c: New test.
            * gcc.target/aarch64/pr122749_5.c: New test.
            * gcc.target/aarch64/pr122749_6.c: New test.
            * gcc.target/aarch64/pr122749_8.c: New test.
            * gcc.target/aarch64/pr122749_9.c: New test.
            * gcc.target/aarch64/sve/pr122749_1.c: New test.
            * gcc.target/aarch64/sve/pr122749_11.c: New test.
            * gcc.target/aarch64/sve/pr122749_12.c: New test.
            * gcc.target/aarch64/sve/pr122749_13.c: New test.
            * gcc.target/aarch64/sve/pr122749_14.c: New test.
            * gcc.target/aarch64/sve/pr122749_2.c: New test.
            * gcc.target/aarch64/sve/pr122749_3.c: New test.
            * gcc.target/aarch64/sve/pr122749_4.c: New test.
            * gcc.target/aarch64/sve/pr122749_5.c: New test.
            * gcc.target/aarch64/sve/pr122749_6.c: New test.
            * gcc.target/aarch64/sve/pr122749_8.c: New test.
            * gcc.target/aarch64/sve/pr122749_9.c: New test.
  • [Bug tree-optimization/122749] ... cvs-commit at gcc dot gnu.org via Gcc-bugs

Reply via email to