[Bug tree-optimization/123089] [16 regression] [x86] Hangs at runtime with -march=znver3 since r16-5780-g65a3849eb46df2

cvs-commit at gcc dot gnu.org via Gcc-bugs Sun, 21 Dec 2025 00:28:33 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123089


--- Comment #18 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:fb1855b4179ab8d4bb461b7226ec43cf9005c753

commit r16-6316-gfb1855b4179ab8d4bb461b7226ec43cf9005c753
Author: Tamar Christina <[email protected]>
Date:   Sun Dec 21 08:27:13 2025 +0000

    vect: use wider precision type for generating early break scalar IV
[PR123089]

    In the PR we see that the new scalar IV tricks other passes to think
there's an
    overflow to the use of a signed counter:

    The loop is known to iterate 8191 times and we have a VF of 8 and it starts
    at 2.

    The codegen out of the vectorizer is the same as before, except we now have
a
    scalar variable counting the scalar iteration count vs a vector one.

    i.e. we have

    _45 = _39 + 8;

    vs

    _46 = _45 + { 16, 16, 16, 16, ... }

    we pick a lower VF now since costing allows it to but that's not important.

    When we get to cunroll since the value is now scalar, it sees that 8 * 8191
    would overflow a signed short and so it changes the loop bounds to the
largest
    possible signed value and then uses this to elide the ivtmp_50 < 8191 as
always
    true and so you get an infinite loop:

    Analyzing # of iterations of loop 1
      exit condition [1, + , 1](no_overflow) < 8191
      bounds on difference of bases: 8190 ... 8190
      result:
        # of iterations 8190, bounded by 8190
    Statement (exit)if (ivtmp_50 < 8191)
     is executed at most 8190 (bounded by 8190) + 1 times in loop 1.
    Induction variable (signed short) 8 + 8 * iteration does not wrap in
statement
    _45 = _39 + 8;
     in loop 1.
    Statement _45 = _39 + 8;
     is executed at most 4094 (bounded by 4094) + 1 times in loop 1.

    The signed type was originally chosen because of the negative offset we use
when
    adjusting for peeling for alignments with masks.  However this then
introduces
    issues as we see here with signed overflow.  This patch instead determines
the
    smallest possible unsigned type for use by the scalar IV where the overflow
    won't happen when we include the extra bit for the sign. i.e. if the scalar
IV
    is an unsigned 8 bit value we pick a signed 16-bit type.  But if a signed
8-bit
    value we pick a unsigned 8 bit type.

    We use the initial niters value to determine the smallest size possible, to
    prevent certain cases like when the IV in code is a 64-bit to need a TImode
    counter.  I also only require the additional bit when I know we'll be
generating
    the SMAX.  I've now moved this to vectorizable_early_exit such that if we
do
    end up needing something like TImode that we don't vectorize if the target
    doesn't support it.

    I've also added some testcases for masking around the boundary values. 
I've
    only added them for char to reduce the runtime of the tests.

    gcc/ChangeLog:

            PR tree-optimization/123089
            * tree-vect-loop.cc
(vect_update_ivs_after_vectorizer_for_early_breaks):
            Add conversion if required, Note that if we did truncate the
original
            scalar loop had an overflow here anyway.
            (vect_get_max_nscalars_per_iter): Expose.
            * tree-vect-stmts.cc (vect_compute_type_for_early_break_scalar_iv):
New.
            (vectorizable_early_exit): Find smallest type where we won't have
UB in
            the signed IV and store it.
            * tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_IV_TYPE): New.
            (class _loop_vec_info): Add early_break_iv_type.
            (vect_min_prec_for_max_niters): New.
            * tree-vect-loop-manip.cc (vect_do_peeling): Use it.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/123089
            * gcc.dg/vect/vect-early-break_141-pr123089.c: New test.
            * gcc.target/aarch64/sve/peel_ind_14.c: New test.
            * gcc.target/aarch64/sve/peel_ind_14_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_15.c: New test.
            * gcc.target/aarch64/sve/peel_ind_15_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_16.c: New test.
            * gcc.target/aarch64/sve/peel_ind_16_run.c: New test.
            * gcc.target/aarch64/sve/peel_ind_17.c: New test.
            * gcc.target/aarch64/sve/peel_ind_17_run.c: New test.

[Bug tree-optimization/123089] [16 regression] [x86] Hangs at runtime with -march=znver3 since r16-5780-g65a3849eb46df2

Reply via email to