https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123969
Robin Dapp <rdapp at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2026-02-04
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
CC| |rdapp at gcc dot gnu.org
--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> ---
With small adjustments this can also be reproduced with GCC 16.
The strided load is a strided broadcast of an HI element. We don't actually
use strided loads for those anymore and there's no tunable. I think Paul
Antoine stumbled over this as well.
But if I force it to use strided broadcast we will happily generate the same
strided load.
This is not the issue, though, but it's a vsetvl problem again. With the
simple strategy we get:
th.vsetvli zero,a1,e16,m1
th.vlse.v v1,0(a0),zero
th.vsetvli zero,a1,e32,m2
th.vmv.v.i v2,0
th.vsetvli zero,a1,e16,m1
th.vse.v v1,0(a0)
th.vsetvli zero,a1,e32,m2
th.vse.v v2,0(a0)
And it's even the same problem as before with the full-register moves. And
quite possibly I can simplify the patch for it even because I was under a wrong
assumption. The theadvector loads and stores don't encode the element width
in the instruction while RVV 1.0 does. The latter only needs a ratio and not
SEW and LMUL explicitly like the former.