https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123190

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <[email protected]>:

https://gcc.gnu.org/g:948d33f490a6b0051376da6bdcf55223a552b30f

commit r16-6767-g948d33f490a6b0051376da6bdcf55223a552b30f
Author: Richard Biener <[email protected]>
Date:   Wed Jan 14 12:45:19 2026 +0100

    tree-optimization/123190 - fix costing of permuted contiguous loads

    The following fixes a regression from the time we split load groups
    along SLP boundaries.  When we face a permuted load from an access
    that is contiguous across loop iterations we emit code that loads
    the whole group and then emit required permutations.  The permutations
    might not need all those loads, and if we split the group we would
    not have emitted them.  Fortunately when analyzing a permutation
    we compute both the number of required permutes and the number of
    loads that will survive the followin DCE.  So make sure to use that
    when costing.  This allows the previously added testcase for PR123190
    to undergo epilog vectorization also at -O2 plus when using non-generic
    tuning, such as tuning for Zen4 which ups the cost for XMM loads.

            PR tree-optimization/123190
            * tree-vectorizer.h (vect_load_store_data): Add n_loads member.
            * tree-vect-stmts.cc (get_load_store_type): Record the
            number of required loads for permuted loads.
            (vectorizable_load): Make use of this when costing loads
            for VMAT_CONTIGUOUS[_REVERSE].

            * gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: Do not
            require -mtune=generic.
            * gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-2.c: Add
            variant with -O2 instead of -O3, inner loop not unrolled.

Reply via email to