https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117558
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap) % nunits we know that group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo) is a multiple of nunits, given we are only interested in the remainder this becomes then (LCM (nunits, group_size) - gap) % nunits here we have nunits < group_size, gap can be > nunits as well. Btw, I wonder whether for VL vectors we can rely on a loop mask/length being present and thus "remain" being always X - gap, thus we always access at most 'gap' elements in excess (the rest is masked off), and a single scalar iteration is enough? It's only when not having a loop mask/len that the VF can make us access too many elements and also only when nunits > group_size. In turn this means we can make a single scalar iteration enough to peel by enforcing a niter mask? Do we have a testcase where peeling a single scalar iteration isn't enough for VL vectors?