One more note: we found a real case in spec 2006, SLP convert two 8 bit into int8x2_t, but the value has live across the function call, it only need to save-restore 16 bit, but it become save-restore VLEN bits because it using VLA mode in backend, you could imagine when VLEN is larger, the performance penalty will also increase, which is opposite way we expect - larger VLEN better performance.
On Tue, May 30, 2023 at 5:11 PM Kito Cheng <kito.ch...@sifive.com> wrote: > > (I am still on the meeting hell, and will be released very later, > apology for short and incomplete reply, and will reply complete later) > > One point for adding VLS mode support is because SLP, especially for > those SLP candidate not in the loop, those case use VLS type can be > better, of cause using larger safe VLA type can optimize too, but that > will cause one issue we found in RISC-V in LLVM - it will spill/reload > whole register instead of exact size. > > e.g. > > int32x4_t a; > // def a > // spill a > foo () > // reload a > // use a > > Consider we use a VLA mode for a, it will spill and reload with whole > register VLA mode > Online demo here: https://godbolt.org/z/Y1fThbxE6 > > On Tue, May 30, 2023 at 5:05 PM Robin Dapp <rdapp....@gmail.com> wrote: > > > > >>> but ideally the user would be able to specify -mrvv-size=32 for an > > >>> implementation with 32 byte vectors and then vector lowering would make > > >>> use > > >>> of vectors up to 32 bytes? > > > > > > Actually, we don't want to specify -mrvv-size = 32 to enable > > > vectorization on GNU vectors. > > > You can take a look this example: > > > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> > > > > > > GCC need to specify the mrvv size to enable GNU vectors and the codegen > > > only can run on CPU with vector-length = 128bit. > > > However, LLVM doesn't need to specify the vector length, and the codegen > > > can run on any CPU with RVV vector-length >= 128 bits. > > > > > > This is what this patch want to do. > > > > > > Thanks. > > I think Richard's question was rather if it wasn't better to do it more > > generically and lower vectors to what either the current cpu or what the > > user specified rather than just 16-byte vectors (i.e. indeed a fixed > > vlmin and not a fixed vlmin == fixed vlmax). > > > > This patch assumes everything is fixed for optimization purposes and then > > switches over to variable-length when nothing can be changed anymore. That > > is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime? > > We would need to make sure that no pass after reload makes use of VLA > > properties at all. > > > > In general I don't have a good overview of which optimizations we gain by > > such an approach or rather which ones are prevented by VLA altogether? > > What's the idea for the future? Still use LEN_LOAD et al. (and masking) > > with "fixed vlmin"? Wouldn't we select different IVs with this patch than > > what we would have for pure VLA? > > > > Regards > > Robin