https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120352

--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:8af2e8e49d6e5d33c01c2beaead4933bc286974c

commit r17-837-g8af2e8e49d6e5d33c01c2beaead4933bc286974c
Author: Tamar Christina <[email protected]>
Date:   Wed May 27 10:53:07 2026 +0100

    vect: Don't generate scalar epilogue if not needed [PR120352]

    The example loop

    #define N 4
    int a[N] = {0,0,0,1};
    int b[N] = {0,0,0,1};

    __attribute__((noipa, noinline))
    int foo ()
    {
      for (int i = 0; i < N; i++)
        {
          if (a[i] > b[i])
            return 1;
        }
      return 0;
    }

    compiled with -O3 -march=armv9-a generates

    foo:
            adrp    x2, .LANCHOR0
            add     x1, x2, :lo12:.LANCHOR0
            ptrue   p7.b, vl16
            mov     w0, 0
            ldr     q30, [x2, #:lo12:.LANCHOR0]
            ldr     q31, [x1, 16]
            cmpgt   p7.s, p7/z, z30.s, z31.s
            b.any   .L7
            ret
    .L7:
            ldr     w2, [x2, #:lo12:.LANCHOR0]
            ldr     w0, [x1, 16]
            cmp     w2, w0
            bgt     .L4
            ldr     w0, [x1, 4]
            ldr     w2, [x1, 20]
            cmp     w2, w0
            blt     .L4
            ldr     w0, [x1, 8]
            ldr     w2, [x1, 24]
            cmp     w2, w0
            blt     .L4
            ldr     w2, [x1, 12]
            ldr     w0, [x1, 28]
            cmp     w2, w0
            cset    w0, gt
            ret
    .L4:
            mov     w0, 1
            ret

    Which when we find an element, in order to return 1 we still go to scalar.
    Obviously the scalar code is completely unneeded.

    This patch teaches the vectorizer that when

    1. We have no live values
    2. We only have one exit (this is a restriction that will be lifted in a
later
       patch and is there because we need masking to avoid false positives, but
see
       testcase vect-early-break-no-epilog_11.c)
    3. The loop has no side-effects

    then we don't need the scalar epilogue at all.

    e.g. for the above we now generate

    foo:
            adrp    x0, .LANCHOR0
            add     x0, x0, :lo12:.LANCHOR0
            ptrue   p7.s, vl4
            ldp     q31, q30, [x0]
            cmplt   p15.s, p7/z, z30.s, z31.s
            cset    w0, any
            ret

    gcc/ChangeLog:

            PR tree-optimization/120352
            * tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_NEEDS_EPILOG): New.
            (class _loop_vec_info): Add early_break_needs_epilogue.
            * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Detect
            usage of stores.
            * tree-vect-loop-manip.cc (vect_do_peeling): Use them.
            * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Likewise.
            (vect_create_loop_vinfo): Likewise.
            (vect_update_ivs_after_vectorizer_for_early_breaks): Likewise.
            * tree-vect-stmts.cc (vect_stmt_relevant_p): Likewise.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/120352
            * gcc.dg/vect/vect-early-break-no-epilog_1.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_10.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_11.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_2.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_3.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_4.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_5.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_6.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_7.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_8.c: New test.
            * gcc.dg/vect/vect-early-break-no-epilog_9.c: New test.
            * gcc.target/aarch64/noeffect.c: New test.
            * gcc.target/aarch64/noeffect10.c: New test.
            * gcc.target/aarch64/noeffect11.c: New test.
            * gcc.target/aarch64/noeffect2.c: New test.
            * gcc.target/aarch64/noeffect3.c: New test.
            * gcc.target/aarch64/noeffect4.c: New test.
            * gcc.target/aarch64/noeffect5.c: New test.
            * gcc.target/aarch64/noeffect6.c: New test.
            * gcc.target/aarch64/noeffect7.c: New test.
            * gcc.target/aarch64/noeffect8.c: New test.
            * gcc.target/aarch64/noeffect9.c: New test.
            * gcc.target/aarch64/sve/noeffect.c: New test.
            * gcc.target/aarch64/sve/noeffect10.c: New test.
            * gcc.target/aarch64/sve/noeffect11.c: New test.
            * gcc.target/aarch64/sve/noeffect2.c: New test.
            * gcc.target/aarch64/sve/noeffect3.c: New test.
            * gcc.target/aarch64/sve/noeffect4.c: New test.
            * gcc.target/aarch64/sve/noeffect5.c: New test.
            * gcc.target/aarch64/sve/noeffect6.c: New test.
            * gcc.target/aarch64/sve/noeffect7.c: New test.
            * gcc.target/aarch64/sve/noeffect8.c: New test.
            * gcc.target/aarch64/sve/noeffect9.c: New test.

Reply via email to