On Tue, 26 May 2020, Kewen.Lin wrote: > Hi Richi, > > on 2020/5/26 下午3:12, Richard Biener wrote: > > On Tue, 26 May 2020, Kewen.Lin wrote: > > > >> Hi all, > >> > >> This patch set adds support for vector load/store with length, Power > >> ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with > >> length, it's good to be exploited for those cases we don't have enough > >> stuffs to fill in the whole vector like epilogues. > >> > >> This support mainly refers to the handlings for fully-predicated loop > >> but it also covers the epilogue usage. Now it supports two modes > >> controlled by parameter vect-with-length-scope, it can support any > >> loops fully with length or just for those cases with small iteration > >> counts less than VF like epilogue, for now I don't have ready env to > >> benchmark it, but based on the current inefficient length generation, > >> I don't think it's a good idea to adopt vector with length for any loops. > >> For the main loop which used to be vectorized, it increases register > >> pressure and introduces extra computation for length, the pro for icache > >> seems not comparable. But I think it might be a good idea to keep this > >> parameter there for functionality testing, further benchmarking and other > >> ports' potential future supports. > > > > Can you explain in more detail what "vector load/store with length" does? > > Is that a "simplified" masked load/store which instead of masking > > arbitrary elements (and need a mask computed in the first place), masks > > elements > N (the length operand)? Thus assuming a lane IV decrementing > > to zero that IV would be the natural argument for the length operand? > > If that's correct, what data are the remaining lanes filled with? > > > > The vector load/store have one GPR holding one length in bytes (called as > n here) to control how many bytes we want to load/store. If n > vector_size > (on Power it's 16), n will be taken as 16, if n is zero, the storage access > won't happen, the result for load is vector zero, if n > 0 but < vector_size, > the remaining lanes will be filled with zero. On Power, it checks 0:7 bits > of the length GPR, so the length should be adjusted. > > Your understanding is correct! It's a "simplified" masked load/store, need > the length in bytes computed, only support continuous access. For the lane > IV, the one should multiply with the lane bytes and be adjusted if needed. > > > From a look at the series description below you seem to add a new way > > of doing loads for this. Did you review other ISAs (those I'm not > > familiar with myself too much are SVE, RISC-V and GCN) in GCC whether > > they have similar support and whether your approach can be supported > > there? ISTR SVE must have some similar support - what's the reason > > you do not piggy-back on that? > > I may miss something, but I didn't find there is any support that meet with > this in GCC. Good suggestion on ISAs, I didn't review other ISAs, for the > current implementation, I heavily referred to existing SVE fully predicated > loop support, it's stronger than "with length", it can deal with arbitrary > elements, only perform for the loop fully etc. > > > > > I think a load like I described above might be represented as > > > > _1 = __VIEW_CONVERT <v4df_t> (__MEM <double[n_2]> ((double *)p_3)); > > > > not sure if that actually works out though. But given it seems it > > is a contiguous load we shouldn't need an internal function here? > > [there's a possible size mismatch in the __VIEW_CONVERT above, I guess > > on RTL you end up with a paradoxical subreg - or an UNSPEC] > > IIUC, what you suggested is to avoid use IFN here. May I know the reason? > is it due to the maintenance efforts on various IFNs? I thought using > IFN is gentle. And agreed, I had the concern that the size mismatch > problem here since the length can be large (extremely large probably, it > depends on target saturated limit), the converted vector size can be small. > Besides, the length can be a variable. > > > > > That said, I'm not very happy seeing yet another way of doing loads > > [for fully predicated loops]. I'd rather like to see a single > > representation on GIMPLE at least. > > OK. Does it mean IFN isn't counted into this scope? :)
Sure going with a new IFN is a possibility. I'd just avoid that if possible ;) How does SVE represent loads/stores for whilelt loops? Would it not be possible to share that representation somehow? Richard. > > > > Will dig into the patch once the actual workings of those load/store with > > length is confirmed. > > Thanks a lot for your time! > > > > > I don't spot tree-vect-slp.c being changed - maybe that's not necessary > > for SLP operation, but please do not introduce new vectorizer features > > without supporting SLP operation at this point. > > Got it. SLP is also one part to be supported, I forgot to state in the > original email. I'll let it be for now. :) > > BR, > Kewen > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)