On Tue, 26 May 2020, Kewen.Lin wrote:

> Hi Richi,
> 
> on 2020/5/26 下午3:12, Richard Biener wrote:
> > On Tue, 26 May 2020, Kewen.Lin wrote:
> > 
> >> Hi all,
> >>
> >> This patch set adds support for vector load/store with length, Power 
> >> ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with
> >> length, it's good to be exploited for those cases we don't have enough
> >> stuffs to fill in the whole vector like epilogues.
> >>
> >> This support mainly refers to the handlings for fully-predicated loop
> >> but it also covers the epilogue usage.  Now it supports two modes
> >> controlled by parameter vect-with-length-scope, it can support any
> >> loops fully with length or just for those cases with small iteration
> >> counts less than VF like epilogue, for now I don't have ready env to
> >> benchmark it, but based on the current inefficient length generation,
> >> I don't think it's a good idea to adopt vector with length for any loops.
> >> For the main loop which used to be vectorized, it increases register
> >> pressure and introduces extra computation for length, the pro for icache
> >> seems not comparable.  But I think it might be a good idea to keep this
> >> parameter there for functionality testing, further benchmarking and other
> >> ports' potential future supports.
> > 
> > Can you explain in more detail what "vector load/store with length" does?
> > Is that a "simplified" masked load/store which instead of masking 
> > arbitrary elements (and need a mask computed in the first place), masks
> > elements > N (the length operand)?  Thus assuming a lane IV decrementing
> > to zero that IV would be the natural argument for the length operand?
> > If that's correct, what data are the remaining lanes filled with?
> > 
> 
> The vector load/store have one GPR holding one length in bytes (called as
> n here) to control how many bytes we want to load/store.  If n > vector_size
> (on Power it's 16), n will be taken as 16, if n is zero, the storage access
> won't happen, the result for load is vector zero, if n > 0 but < vector_size,
> the remaining lanes will be filled with zero.  On Power, it checks 0:7 bits
> of the length GPR, so the length should be adjusted.
> 
> Your understanding is correct!  It's a "simplified" masked load/store, need
> the length in bytes computed, only support continuous access.  For the lane
> IV, the one should multiply with the lane bytes and be adjusted if needed.
> 
> > From a look at the series description below you seem to add a new way
> > of doing loads for this.  Did you review other ISAs (those I'm not
> > familiar with myself too much are SVE, RISC-V and GCN) in GCC whether
> > they have similar support and whether your approach can be supported
> > there?  ISTR SVE must have some similar support - what's the reason
> > you do not piggy-back on that?
> 
> I may miss something, but I didn't find there is any support that meet with
> this in GCC.  Good suggestion on ISAs, I didn't review other ISAs, for the
> current implementation, I heavily referred to existing SVE fully predicated
> loop support, it's stronger than "with length", it can deal with arbitrary
> elements, only perform for the loop fully etc.
> 
> > 
> > I think a load like I described above might be represented as
> > 
> > _1 = __VIEW_CONVERT <v4df_t> (__MEM <double[n_2]> ((double *)p_3));
> > 
> > not sure if that actually works out though.  But given it seems it
> > is a contiguous load we shouldn't need an internal function here?
> > [there's a possible size mismatch in the __VIEW_CONVERT above, I guess
> > on RTL you end up with a paradoxical subreg - or an UNSPEC]
> 
> IIUC, what you suggested is to avoid use IFN here.  May I know the reason?
> is it due to the maintenance efforts on various IFNs?  I thought using
> IFN is gentle.  And agreed, I had the concern that the size mismatch
> problem here since the length can be large (extremely large probably, it
> depends on target saturated limit), the converted vector size can be small.
> Besides, the length can be a variable.
>
> > 
> > That said, I'm not very happy seeing yet another way of doing loads
> > [for fully predicated loops].  I'd rather like to see a single
> > representation on GIMPLE at least.
> 
> OK.  Does it mean IFN isn't counted into this scope?  :)

Sure going with a new IFN is a possibility.  I'd just avoid that if
possible ;)  How does SVE represent loads/stores for whilelt loops?
Would it not be possible to share that representation somehow?

Richard.

> > 
> > Will dig into the patch once the actual workings of those load/store with
> > length is confirmed.
> 
> Thanks a lot for your time!
> 
> > 
> > I don't spot tree-vect-slp.c being changed - maybe that's not necessary
> > for SLP operation, but please do not introduce new vectorizer features
> > without supporting SLP operation at this point.
> 
> Got it.  SLP is also one part to be supported, I forgot to state in the
> original email.  I'll let it be for now.  :)
> 
> BR,
> Kewen
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to