[Bug tree-optimization/123190] [16 Regression] 8% slowdown of 433.milc on AMD zen4 since r16-5275-ga645e903e8c394

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 14 Jan 2026 05:44:57 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123190


--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <[email protected]>:

https://gcc.gnu.org/g:96bc77e45c202303b87647a44206c6475f996c5c

commit r16-6766-g96bc77e45c202303b87647a44206c6475f996c5c
Author: Richard Biener <[email protected]>
Date:   Wed Jan 14 10:53:05 2026 +0100

    tree-optimization/123190 - allow VF == 1 epilog vectorization

    The following adjusts the condition where we reject vectorization
    because the scalar loop runs only for a single iteration (or two,
    in case we need to peel for gaps).  Because this is over-eager
    when considering the case of VF == 1 where instead the cost model
    should decide wheter it is worthwhile or not.  I'm playing
    conservative here and exclude the case of two iterations as I
    do not have benchmark evidence.

    This helps fixing a regression observed with improved SLP handling,
    not exactly for the options used in the PR though, but for a more
    common -O3 -march=x86-64-v3 this speeds up 433.milc by 6%.

            PR tree-optimization/123190
            * tree-vect-loop.cc (vect_analyze_loop_costing): Allow
            vectorizing loops with a single scalar iteration iff the
            vectorization factor is 1.

            * gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: New
testcase.
            * gcc.dg/vect/slp-28.c: Avoid epilogue vectorization for
            simplicity.

[Bug tree-optimization/123190] [16 Regression] 8% slowdown of 433.milc on AMD zen4 since r16-5275-ga645e903e8c394

Reply via email to