> Am 06.09.2024 um 16:05 schrieb Robin Dapp <rdapp....@gmail.com>:
>
> Hi,
>
> PR112694 shows that we try to create sub-vectors of single-element
> vectors because can_duplicate_and_interleave_p returns true.
Can we avoid querying the function? CCing Richard who should know more about
this.
Richard
> The problem resurfaced in PR116611.
>
> This patch makes can_duplicate_and_interleave_p return false
> if count / nvectors > 0 and removes the corresponding check in the riscv
> backend.
>
> This partially gets rid of the FAIL in slp-19a.c. At least when built
> with cost model we don't have LOAD_LANES anymore. Without cost model,
> as in the test suite, we choose a different path and still end up with
> LOAD_LANES.
>
> Bootstrapped and regtested on x86 and power10, regtested on
> rv64gcv_zvfh_zvbb. Still waiting for the aarch64 results.
>
> Regards
> Robin
>
> gcc/ChangeLog:
>
> PR target/112694
> PR target/116611.
>
> * config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
> return.
> * tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
> false when we cannot create sub-elements.
> ---
> gcc/config/riscv/riscv-v.cc | 9 ---------
> gcc/tree-vect-slp.cc | 4 ++++
> 2 files changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9b6c3a21e2d..5c5ed63d22e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode,
> machine_mode op_mode, rtx target,
> mask to do the iteration loop control. Just disable it directly. */
> if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
> return false;
> - /* FIXME: Explicitly disable VLA interleave SLP vectorization when we
> - may encounter ICE for poly size (1, 1) vectors in loop vectorizer.
> - Ideally, middle-end loop vectorizer should be able to disable it
> - itself, We can remove the codes here when middle-end code is able
> - to disable VLA SLP vectorization for poly size (1, 1) VF. */
> - if (!BYTES_PER_RISCV_VECTOR.is_constant ()
> - && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL,
> - poly_int64 (16, 16)))
> - return false;
>
> struct expand_vec_perm_d d;
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3d2973698e2..17b59870c69 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo,
> unsigned int count,
> unsigned int nvectors = 1;
> for (;;)
> {
> + /* We need to be able to to fuse COUNT / NVECTORS elements together,
> + so no point in continuing if there are none. */
> + if (nvectors > count)
> + return false;
> scalar_int_mode int_mode;
> poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
> if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
> --
> 2.46.0
>