https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122028

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2025-09-22
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Unfortunately we rely on quite early lowering of load permutations to implement
interleaving (or load/store-lane), so delaying this decision is difficult.

There is also a cut-off in data ref analysis:

              /* For datarefs with big gap, it's better to split them into
different
                 groups.
                 .i.e a[0], a[1], a[2], .. a[7], a[100], a[101],..., a[107]  */
              if ((unsigned HOST_WIDE_INT)(init_b - init_prev)
                  > MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)
                break;

and a fallback in get_load_store_type:

      /* If this is single-element interleaving with an element
         distance that leaves unused vector loads around fall back
         to elementwise access if possible - we otherwise least
         create very sub-optimal code in that case (and
         blow up memory, see PR65518).  */
      if (loop_vinfo
          && single_element_p
          && (*memory_access_type == VMAT_CONTIGUOUS
              || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
          && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
        {
          if (SLP_TREE_LANES (slp_node) == 1)
            {
              *memory_access_type = VMAT_ELEMENTWISE;
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                 "single-element interleaving not supported "
                                 "for not adjacent vector loads, using "
                                 "elementwise access\n");
            }
          else
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                 "single-element interleaving not supported "
                                 "for not adjacent vector loads\n");
              return false;


But what you say is basically that we use an unnecessarily high VF here.
So instead of running into the above a way would be to set max_vf based on
the constant niter and then reject single-element interleaving because of
it's high required VF.

Reply via email to