On Wed, Sep 17, 2025 at 1:15 PM Robin Dapp <rdapp....@gmail.com> wrote: > > > On Wed, Sep 17, 2025 at 9:22 AM Robin Dapp <rdapp....@gmail.com> wrote: > >> > >> > We are supposed to not get into > >> > > >> > if (mask_element != index) > >> > noop_p = false; > >> > >> I guess the problem is the vectype mismatch. We're checking the > >> permutation > >> for e.g. V16QI = {0, 1, 2, 3, 8, 9, 10, 11, ...} which, in isolation, is > >> not > >> a nop. That's because nelts_to_build = vf * group_size = 16. > >> > >> So either we need to check monotonicity etc. for each punned element later > >> or > >> we somehow need to pun earlier (as you suggested yesterday). > > > > I don't think that would help - the issue is that the group_size is 8 but > > the > > elements 4, 5, 6, 7 are gaps that we simply do not load. That is, the > > permute code does not anticipate that we turned the contiguous load > > into a strided one where we do not load a trailing gap, so effectively have > > group_size == 4? That is, it's dr_group_size that is "wrong" if we want > > to apply the load-permutation after our way of gathering the to be permuted > > elements, as we are not building vectors that have those gaps represented > > but skipped. > > > > Of course this means the early vect_transform_slp_perm_load call computing > > n_perms cannot anticipate whether we are "re-interpreting" the DR group as > > strided. It also means we cannot simply perform a permutation using this > > function without adjusting this. But this means we're not actually > > repeating_p > > right now, correct? > > Yes. > > > One could add a gap_skipped parameter to the function and adjust > > > > dr_group_size = DR_GROUP_SIZE (stmt_info); > > > > to > > > > dr_group_size = DR_GROUP_SIZE (stmt_info) - (gap_skipped ? > > DR_GROUP_GAP (stmt_info) : 0); > > Hmm, guess I'm lost. I'm only ever seeing a group gap of 0 or 1. As we're > analyzing the datarefs all elements are present and AFAIK there is no > traditional group gap (like e.g. when just accessing the first 6 elements of a > group of 8). > > The number of SLP lanes is 4, though.
For a non-STMT_VINFO_STRIDED_P access the DR_GROUP_SIZE is basically the DR_STRIDE, because the DR group models contiguous memory. > > -- > Regards > Robin >