On Tue, Jun 24, 2025 at 4:26 PM Karl Meakin <karl.mea...@arm.com> wrote:
>
> The function `vect_check_gather_scatter` requires the `base` of the load
> to be loop-invariant and the `off`set to be not loop-invariant. When faced
> with a scenario where `base` is not loop-invariant, instead of giving up
> immediately we can try swapping the `base` and `off`, if `off` is
> actually loop-invariant.
>
> Previously, it would only swap if `off` was the constant zero (and so
> trivially loop-invariant). This is too conservative: we can still
> perform the swap if `off` is a more complex but still loop-invariant
> expression, such as a variable defined outside of the loop.
>
> This allows loops like the function below to be vectorised, if the
> target has masked loads and sufficiently large vector registers (eg
> `-march=armv8-a+sve -msve-vector-bits=128`):
>
> ```c
> typedef struct Array {
>     int elems[3];
> } Array;
>
> int loop(Array **pp, int len, int idx) {
>     int nRet = 0;
>
>     for (int i = 0; i < len; i++) {
>         Array *p = pp[i];
>         if (p) {
>             nRet += p->elems[idx];
>         }
>     }
>
>     return nRet;
> }
> ```
>
> gcc/ChangeLog:
>         * tree-vect-data-refs.cc (vect_check_gather_scatter): Swap
>         `base` and `off` in more scenarios. Also assert at the end of
>         the function that `base` and `off` are loop-invariant and not
>         loop-invariant respectively.
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/aarch64/sve/mask_load_2.c: Update tests.
> ---
>  .../gcc.target/aarch64/sve/mask_load_2.c      |  4 +--
>  gcc/tree-vect-data-refs.cc                    | 26 ++++++++-----------
>  2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
> index 38fcf4f7206..66d95101a14 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c
> @@ -19,5 +19,5 @@ int loop(Array **pp, int len, int idx) {
>      return nRet;
>  }
>
> -// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 0 } }
> -// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  0 } }
> +// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 1 } }
> +// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m}  1 } }
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index ee040eb9888..d352ca8bcc3 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -4659,26 +4659,19 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>    if (off == NULL_TREE)
>      off = size_zero_node;
>
> -  /* If base is not loop invariant, either off is 0, then we start with just
> -     the constant offset in the loop invariant BASE and continue with base
> -     as OFF, otherwise give up.
> -     We could handle that case by gimplifying the addition of base + off
> -     into some SSA_NAME and use that as off, but for now punt.  */
> +  /* BASE must be loop invariant.  If it is not invariant, but OFF is, then 
> we
> +   * can fix that by swapping BASE and OFF.  */
>    if (!expr_invariant_in_loop_p (loop, base))
>      {
> -      if (!integer_zerop (off))
> +      if (!expr_invariant_in_loop_p (loop, off))
>         return false;
> -      off = base;
> -      base = size_int (pbytepos);
> -    }
> -  /* Otherwise put base + constant offset into the loop invariant BASE
> -     and continue with OFF.  */
> -  else
> -    {
> -      base = fold_convert (sizetype, base);
> -      base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
> +
> +      std::swap (base, off);
>      }
>
> +  base = fold_convert (sizetype, base);
> +  base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
> +
>    /* OFF at this point may be either a SSA_NAME or some tree expression
>       from get_inner_reference.  Try to peel off loop invariants from it
>       into BASE as long as possible.  */
> @@ -4856,6 +4849,9 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>        offset_vectype = NULL_TREE;
>      }
>
> +  gcc_assert (expr_invariant_in_loop_p (loop, base));
> +  gcc_assert (!expr_invariant_in_loop_p (loop, off));

Those are reasonably expensive that we want to avoid them, please
make them gcc_checking_assert at least.

OK with that change.
Richard.

> +
>    info->ifn = ifn;
>    info->decl = decl;
>    info->base = base;
> --
> 2.45.2
>

Reply via email to