https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123806
--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> --- Just want to get a few thoughts written down... ISTM the core of the problem is vector instructions kind of implicitly set VL because they can trigger vsetvl insertion. Yet we don't model that at all which leads to the possibility that even if we expand the RTL in the preferred way things can still get messed up in various ways until the dataflow is accurate (after vsetvl insertion). I think that argues that up to vsetvl insertion that we probably need to keep the FoF load and VL read as an atomic unit. That implies two sets in a PARALLEL. We'd like need to model that as needing a particular vector configuration (for the load), but clobbering the vector configuration (so that we get a fresh vsetvl after the FoF & VL read). Once we're done with vsetvl insertion we can probably split the insn into the FoF load and CSR read as the dataflow should be correct at that point. At least that's what my oxygen deprived brain came up with on the flight home today..
