https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122028
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2025-09-22 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Unfortunately we rely on quite early lowering of load permutations to implement interleaving (or load/store-lane), so delaying this decision is difficult. There is also a cut-off in data ref analysis: /* For datarefs with big gap, it's better to split them into different groups. .i.e a[0], a[1], a[2], .. a[7], a[100], a[101],..., a[107] */ if ((unsigned HOST_WIDE_INT)(init_b - init_prev) > MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT) break; and a fallback in get_load_store_type: /* If this is single-element interleaving with an element distance that leaves unused vector loads around fall back to elementwise access if possible - we otherwise least create very sub-optimal code in that case (and blow up memory, see PR65518). */ if (loop_vinfo && single_element_p && (*memory_access_type == VMAT_CONTIGUOUS || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype))) { if (SLP_TREE_LANES (slp_node) == 1) { *memory_access_type = VMAT_ELEMENTWISE; if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "single-element interleaving not supported " "for not adjacent vector loads, using " "elementwise access\n"); } else { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "single-element interleaving not supported " "for not adjacent vector loads\n"); return false; But what you say is basically that we use an unnecessarily high VF here. So instead of running into the above a way would be to set max_vf based on the constant niter and then reject single-element interleaving because of it's high required VF.