https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103116
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot
gnu.org
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
We could make peeling for gaps handle this by making it not a flag but indicate
the number of vector(!?) iterations we need to peel. We're doing the "correct"
thing in adjusting the IV increment via
if (slp_perm
&& (group_size != scalar_lanes
|| !multiple_p (nunits, group_size)))
{
/* We don't yet generate such SLP_TREE_LOAD_PERMUTATIONs for
variable VF; see vect_transform_slp_perm_load. */
unsigned int const_vf = vf.to_constant ();
unsigned int const_nunits = nunits.to_constant ();
vec_num = CEIL (group_size * const_vf, const_nunits);
group_gap_adj = vf * group_size - nunits * vec_num;
The problem also shows up for loops like
for (int i = 0; i < COUNT; ++i)
{
x[i * 4] = y[i * 3] + 1;
x[i * 4 + 1] = y[i * 3] + 2;
x[i * 4 + 2] = y[i * 3 + 1] + 3;
x[i * 4 + 3] = y[i * 3 + 2] + 4;
}
where we cannot use a smaller vector type.
We could also use masked loads if available of course (not sure about the
cost of that vs peeling for gaps).
A conservative fix would be to detect when peeling for gaps as implemented
is good enough and do that and otherwise reject vectorization.