On Tue, 26 May 2020, Kewen.Lin wrote:

> Hi all,
> 
> This patch set adds support for vector load/store with length, Power 
> ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with
> length, it's good to be exploited for those cases we don't have enough
> stuffs to fill in the whole vector like epilogues.
> 
> This support mainly refers to the handlings for fully-predicated loop
> but it also covers the epilogue usage.  Now it supports two modes
> controlled by parameter vect-with-length-scope, it can support any
> loops fully with length or just for those cases with small iteration
> counts less than VF like epilogue, for now I don't have ready env to
> benchmark it, but based on the current inefficient length generation,
> I don't think it's a good idea to adopt vector with length for any loops.
> For the main loop which used to be vectorized, it increases register
> pressure and introduces extra computation for length, the pro for icache
> seems not comparable.  But I think it might be a good idea to keep this
> parameter there for functionality testing, further benchmarking and other
> ports' potential future supports.

Can you explain in more detail what "vector load/store with length" does?
Is that a "simplified" masked load/store which instead of masking 
arbitrary elements (and need a mask computed in the first place), masks
elements > N (the length operand)?  Thus assuming a lane IV decrementing
to zero that IV would be the natural argument for the length operand?
If that's correct, what data are the remaining lanes filled with?

>From a look at the series description below you seem to add a new way
of doing loads for this.  Did you review other ISAs (those I'm not
familiar with myself too much are SVE, RISC-V and GCN) in GCC whether
they have similar support and whether your approach can be supported
there?  ISTR SVE must have some similar support - what's the reason
you do not piggy-back on that?

I think a load like I described above might be represented as

_1 = __VIEW_CONVERT <v4df_t> (__MEM <double[n_2]> ((double *)p_3));

not sure if that actually works out though.  But given it seems it
is a contiguous load we shouldn't need an internal function here?
[there's a possible size mismatch in the __VIEW_CONVERT above, I guess
on RTL you end up with a paradoxical subreg - or an UNSPEC]

That said, I'm not very happy seeing yet another way of doing loads
[for fully predicated loops].  I'd rather like to see a single
representation on GIMPLE at least.

Will dig into the patch once the actual workings of those load/store with
length is confirmed.

I don't spot tree-vect-slp.c being changed - maybe that's not necessary
for SLP operation, but please do not introduce new vectorizer features
without supporting SLP operation at this point.

Thanks,
Richard.

> As we don't have any benchmarking, this support isn't enabled by default
> for any particular cpus, all testings are with explicit parameter setting.
> 
> Bootstrapped on powerpc64le-linux-gnu P9 with all vect-with-length-scope
> settings (0/1/2).  Regress-test passed with vector-with-length-scope 0,
> for the other twos, several vector related cases need to be updated, no
> remarkable failures found.  BTW, P9 is the one which supports the
> functionality but not ready to evaluate the performance.
> 
> Here still are many things to be supported or improved, not limited to:
>   - reduction/live-out support
>   - Cost model adding/tweaking
>   - IFN gimple folding
>   - Some unnecessary ops improvements eg: vector_size check
>   - Some possible refactoring
> I'll support/post the patches gradually.
> 
> Any comments are highly appreciated.
> 
> BR,
> Kewen
> -----
> 
> Patch set outline:
>   [PATCH 1/7] ifn/optabs: Support vector load/store with length
>   [PATCH 2/7] rs6000: lenload/lenstore optab support
>   [PATCH 3/7] vect: Factor out codes for niters smaller than vf check
>   [PATCH 4/7] hook/rs6000: Add vectorize length mode for vector with length
>   [PATCH 5/7] vect: Support vector load/store with length in vectorizer
>   [PATCH 6/7] ivopts: Add handlings for vector with length IFNs
>   [PATCH 7/7] rs6000/testsuite: Vector with length test cases
> 
>  gcc/config/rs6000/rs6000.c                                  |   3 +
>  gcc/config/rs6000/vsx.md                                    |  30 ++++++++++
>  gcc/doc/invoke.texi                                         |   7 +++
>  gcc/doc/md.texi                                             |  16 ++++++
>  gcc/doc/tm.texi                                             |   6 ++
>  gcc/doc/tm.texi.in                                          |   2 +
>  gcc/internal-fn.c                                           |  13 ++++-
>  gcc/internal-fn.def                                         |   6 ++
>  gcc/optabs.def                                              |   2 +
>  gcc/params.opt                                              |   4 ++
>  gcc/target.def                                              |   7 +++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-1.h          |  18 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-2.h          |  17 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-3.h          |  31 +++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-4.h          |  24 ++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-5.h          |  29 ++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-6.h          |  32 +++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c     |  15 +++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-2.c     |  15 +++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-3.c     |  18 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-4.c     |  15 +++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-5.c     |  15 +++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-6.c     |  16 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-1.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-2.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-3.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-4.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-5.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-6.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c     |  16 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c     |  16 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-3.c     |  17 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-4.c     |  16 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-5.c     |  16 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c     |  16 ++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-1.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-2.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-3.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-4.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-5.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-6.c |  10 ++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-1.h      |  34 
> ++++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-2.h      |  36 
> ++++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-3.h      |  34 
> ++++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-4.h      |  62 
> +++++++++++++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-5.h      |  45 
> +++++++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-6.h      |  52 
> +++++++++++++++++
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length.h            |  14 +++++
>  gcc/tree-ssa-loop-ivopts.c                                  |   4 ++
>  gcc/tree-vect-loop-manip.c                                  | 268 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  gcc/tree-vect-loop.c                                        | 272 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  gcc/tree-vect-stmts.c                                       | 152 
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/tree-vectorizer.h                                       |  32 +++++++++++
>  53 files changed, 1545 insertions(+), 18 deletions(-)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to