https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117558

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap) % nunits

we know that group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo) is a multiple
of nunits, given we are only interested in the remainder this becomes then

  (LCM (nunits, group_size) - gap) % nunits

here we have nunits < group_size, gap can be > nunits as well.

Btw, I wonder whether for VL vectors we can rely on a loop mask/length being
present and thus "remain" being always X - gap, thus we always access at most
'gap' elements in excess (the rest is masked off), and a single scalar
iteration
is enough?  It's only when not having a loop mask/len that the VF can make us
access too many elements and also only when nunits > group_size.  In turn
this means we can make a single scalar iteration enough to peel by enforcing
a niter mask?

Do we have a testcase where peeling a single scalar iteration isn't enough for
VL vectors?

Reply via email to