https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123969

Robin Dapp <rdapp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2026-02-04
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
                 CC|                            |rdapp at gcc dot gnu.org

--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> ---
With small adjustments this can also be reproduced with GCC 16.
The strided load is a strided broadcast of an HI element.  We don't actually
use strided loads for those anymore and there's no tunable.  I think Paul
Antoine stumbled over this as well.

But if I force it to use strided broadcast we will happily generate the same
strided load.

This is not the issue, though, but it's a vsetvl problem again.  With the
simple strategy we get:

        th.vsetvli      zero,a1,e16,m1
        th.vlse.v       v1,0(a0),zero
        th.vsetvli      zero,a1,e32,m2
        th.vmv.v.i      v2,0
        th.vsetvli      zero,a1,e16,m1
        th.vse.v        v1,0(a0)
        th.vsetvli      zero,a1,e32,m2
        th.vse.v        v2,0(a0)

And it's even the same problem as before with the full-register moves.  And
quite possibly I can simplify the patch for it even because I was under a wrong
assumption.  The theadvector loads and stores don't encode the element width
in the instruction while RVV 1.0 does.  The latter only needs a ratio and not
SEW and LMUL explicitly like the former.

Reply via email to