https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120352
--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <[email protected]>: https://gcc.gnu.org/g:8af2e8e49d6e5d33c01c2beaead4933bc286974c commit r17-837-g8af2e8e49d6e5d33c01c2beaead4933bc286974c Author: Tamar Christina <[email protected]> Date: Wed May 27 10:53:07 2026 +0100 vect: Don't generate scalar epilogue if not needed [PR120352] The example loop #define N 4 int a[N] = {0,0,0,1}; int b[N] = {0,0,0,1}; __attribute__((noipa, noinline)) int foo () { for (int i = 0; i < N; i++) { if (a[i] > b[i]) return 1; } return 0; } compiled with -O3 -march=armv9-a generates foo: adrp x2, .LANCHOR0 add x1, x2, :lo12:.LANCHOR0 ptrue p7.b, vl16 mov w0, 0 ldr q30, [x2, #:lo12:.LANCHOR0] ldr q31, [x1, 16] cmpgt p7.s, p7/z, z30.s, z31.s b.any .L7 ret .L7: ldr w2, [x2, #:lo12:.LANCHOR0] ldr w0, [x1, 16] cmp w2, w0 bgt .L4 ldr w0, [x1, 4] ldr w2, [x1, 20] cmp w2, w0 blt .L4 ldr w0, [x1, 8] ldr w2, [x1, 24] cmp w2, w0 blt .L4 ldr w2, [x1, 12] ldr w0, [x1, 28] cmp w2, w0 cset w0, gt ret .L4: mov w0, 1 ret Which when we find an element, in order to return 1 we still go to scalar. Obviously the scalar code is completely unneeded. This patch teaches the vectorizer that when 1. We have no live values 2. We only have one exit (this is a restriction that will be lifted in a later patch and is there because we need masking to avoid false positives, but see testcase vect-early-break-no-epilog_11.c) 3. The loop has no side-effects then we don't need the scalar epilogue at all. e.g. for the above we now generate foo: adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 ptrue p7.s, vl4 ldp q31, q30, [x0] cmplt p15.s, p7/z, z30.s, z31.s cset w0, any ret gcc/ChangeLog: PR tree-optimization/120352 * tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_NEEDS_EPILOG): New. (class _loop_vec_info): Add early_break_needs_epilogue. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Detect usage of stores. * tree-vect-loop-manip.cc (vect_do_peeling): Use them. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Likewise. (vect_create_loop_vinfo): Likewise. (vect_update_ivs_after_vectorizer_for_early_breaks): Likewise. * tree-vect-stmts.cc (vect_stmt_relevant_p): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/120352 * gcc.dg/vect/vect-early-break-no-epilog_1.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_10.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_11.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_2.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_3.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_4.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_5.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_6.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_7.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_8.c: New test. * gcc.dg/vect/vect-early-break-no-epilog_9.c: New test. * gcc.target/aarch64/noeffect.c: New test. * gcc.target/aarch64/noeffect10.c: New test. * gcc.target/aarch64/noeffect11.c: New test. * gcc.target/aarch64/noeffect2.c: New test. * gcc.target/aarch64/noeffect3.c: New test. * gcc.target/aarch64/noeffect4.c: New test. * gcc.target/aarch64/noeffect5.c: New test. * gcc.target/aarch64/noeffect6.c: New test. * gcc.target/aarch64/noeffect7.c: New test. * gcc.target/aarch64/noeffect8.c: New test. * gcc.target/aarch64/noeffect9.c: New test. * gcc.target/aarch64/sve/noeffect.c: New test. * gcc.target/aarch64/sve/noeffect10.c: New test. * gcc.target/aarch64/sve/noeffect11.c: New test. * gcc.target/aarch64/sve/noeffect2.c: New test. * gcc.target/aarch64/sve/noeffect3.c: New test. * gcc.target/aarch64/sve/noeffect4.c: New test. * gcc.target/aarch64/sve/noeffect5.c: New test. * gcc.target/aarch64/sve/noeffect6.c: New test. * gcc.target/aarch64/sve/noeffect7.c: New test. * gcc.target/aarch64/sve/noeffect8.c: New test. * gcc.target/aarch64/sve/noeffect9.c: New test.
