On Wed, Sep 17, 2025 at 1:15 PM Robin Dapp <rdapp....@gmail.com> wrote:
>
> > On Wed, Sep 17, 2025 at 9:22 AM Robin Dapp <rdapp....@gmail.com> wrote:
> >>
> >> > We are supposed to not get into
> >> >
> >> >       if (mask_element != index)
> >> >         noop_p = false;
> >>
> >> I guess the problem is the vectype mismatch.  We're checking the 
> >> permutation
> >> for e.g. V16QI = {0, 1, 2, 3, 8, 9, 10, 11, ...} which, in isolation, is 
> >> not
> >> a nop.  That's because nelts_to_build = vf * group_size = 16.
> >>
> >> So either we need to check monotonicity etc. for each punned element later 
> >> or
> >> we somehow need to pun earlier (as you suggested yesterday).
> >
> > I don't think that would help - the issue is that the group_size is 8 but 
> > the
> > elements 4, 5, 6, 7 are gaps that we simply do not load.  That is, the
> > permute code does not anticipate that we turned the contiguous load
> > into a strided one where we do not load a trailing gap, so effectively have
> > group_size == 4?  That is, it's dr_group_size that is "wrong" if we want
> > to apply the load-permutation after our way of gathering the to be permuted
> > elements, as we are not building vectors that have those gaps represented
> > but skipped.
> >
> > Of course this means the early vect_transform_slp_perm_load call computing
> > n_perms cannot anticipate whether we are "re-interpreting" the DR group as
> > strided.  It also means we cannot simply perform a permutation using this
> > function without adjusting this.  But this means we're not actually 
> > repeating_p
> > right now, correct?
>
> Yes.
>
> > One could add a gap_skipped parameter to the function and adjust
> >
> >       dr_group_size = DR_GROUP_SIZE (stmt_info);
> >
> > to
> >
> >       dr_group_size = DR_GROUP_SIZE (stmt_info) - (gap_skipped ?
> > DR_GROUP_GAP (stmt_info) : 0);
>
> Hmm, guess I'm lost.  I'm only ever seeing a group gap of 0 or 1.  As we're
> analyzing the datarefs all elements are present and AFAIK there is no
> traditional group gap (like e.g. when just accessing the first 6 elements of a
> group of 8).
>
> The number of SLP lanes is 4, though.

For a non-STMT_VINFO_STRIDED_P access the DR_GROUP_SIZE is
basically the DR_STRIDE, because the DR group models contiguous memory.

>
> --
> Regards
>  Robin
>

Reply via email to