https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105219

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rsand...@gcc.gnu.org from comment #15)
> (In reply to Richard Biener from comment #14)
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index d7bc34636bd..3b63ab7b669 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -9977,7 +9981,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple
> > *loop_vectorized_call)
> >                             lowest_vf) - 1
> >            : wi::udiv_floor (loop->nb_iterations_upper_bound +
> > bias_for_lowest,
> >                              lowest_vf) - 1);
> > -      if (main_vinfo)
> > +      if (main_vinfo && !main_vinfo->peeling_for_alignment)
> >         {
> >           unsigned int bound;
> >           poly_uint64 main_iters
> It might be better to add the maximum peeling amount to main_iters.
> Maybe you'd prefer this anyway for GCC 12 though.
> 
> I wonder if there's a similar problem for peeling for gaps,
> in cases where the epilogue doesn't need the same peeling.

I don't quite understand the code in if (main_vinfo) but the point is
that for our case main_iters is zero (and so is prologue_iters if that
would exist).  I'm not sure how the code can be adjusted with that
given it computes upper bounds and uses min() for the upper bound
of the epilogue - we'd need to adjust that with a max (2*vf-2, old-upper-bound)
when there's prologue peeling and the short cut exists (I don't actually
compute that).

peeling for gaps means we run the epilogue for main VF more iterations,
but that would just mean the vectorized epilogue executes one more time
and has peeling for gaps applied as well, so the scalar epilogue runs
for epilogue VF more iterations.

I'm not sure what conditions prevent epilogue vectorization but I think
there were some at least.

Reply via email to