https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122448

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2025-11-03
     Ever confirmed|0                           |1
            Summary|Wrong rvv code with -O3     |[15 16 Regression] Wrong
                   |                            |rvv code with -O3
             Status|UNCONFIRMED                 |NEW
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Bisection landed on:

commit d93353e6423ecaaae9fa47d0935caafd9abfe4de
Author: Richard Biener <[email protected]>
Date:   Fri Feb 23 11:45:50 2024 +0100

    Do single-lane SLP discovery for reductions

    The following performs single-lane SLP discovery for reductions.
    It requires a fixup for outer loop vectorization where a check
    for multiple types needs adjustments as otherwise bogus pointer
    IV increments happen when there are multiple copies of vector stmts
    in the inner loop.

    For the reduction epilog handling this extends the optimized path
    to cover the trivial single-lane SLP reduction case.

    The fix for PR65518 implemented in vect_grouped_load_supported for
    non-SLP needs a SLP counterpart that I put in get_group_load_store_type.

    I've decided to adjust three testcases for appearing single-lane
    SLP instances instead of not dumping "vectorizing stmts using SLP"
    for single-lane instances as that also requires testsuite adjustments.

            * tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
            discoveries are reduction chains and need special backedge
            treatment.
            (vect_analyze_slp): Fall back to single-lane SLP discovery
            for reductions.  Make sure to try single-lane SLP reduction
            for all reductions as fallback.
            (vectorizable_load): Avoid outer loop SLP vectorization with
            multi-copy vector stmts in the inner loop.
            (vectorizable_store): Likewise.
            * tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
            direct opcode and shift reduction also for SLP reductions
            with a single lane.
            * tree-vect-stmts.cc (get_group_load_store_type): For SLP also
            check for the PR65518 single-element interleaving case as done in
            vect_grouped_load_supported.

            * gcc.dg/vect/slp-24.c: Expect another SLP instance for the
            reduction.
            * gcc.dg/vect/slp-24-big-array.c: Likewise.
            * gcc.dg/vect/slp-reduc-6.c: Remove scan for zero SLP instances.

Reply via email to