Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> Hi,
> The patch folds:
> lhs = svld1rq ({-1, -1, -1, ...}, &v[0])
> into:
> lhs = vec_perm_expr<v, v, {0, 1, 2, 3, ... }>
> and expands above vec_perm_expr using aarch64_expand_sve_dupq.
>
> With patch, for following test:
> #include <arm_sve.h>
> #include <arm_neon.h>
>
> svint32_t
> foo (int32x4_t x)
> {
>   return svld1rq (svptrue_b8 (), &x[0]);
> }
>
> it generates following code:
> foo:
> .LFB4350:
>         dup     z0.q, z0.q[0]
>         ret
>
> and passes bootstrap+test on aarch64-linux-gnu.
> But I am not sure if the changes to aarch64_evpc_sve_tbl
> are correct.

Just in case: I was only using int32x4_t in the PR as an example.
The same thing should work for all element types.

>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 02e42a71e5e..e21bbec360c 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1207,6 +1207,56 @@ public:
>      insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0));
>      return e.use_contiguous_load_insn (icode);
>    }
> +
> +  gimple *
> +  fold (gimple_folder &f) const OVERRIDE
> +  {
> +    tree arg0 = gimple_call_arg (f.call, 0);
> +    tree arg1 = gimple_call_arg (f.call, 1);
> +
> +    /* Transform:
> +       lhs = svld1rq ({-1, -1, ... }, &v[0])
> +       into:
> +       lhs = vec_perm_expr<v, v, {0, 1, 2, 3, ...}>.
> +       on little endian target.  */
> +
> +    if (!BYTES_BIG_ENDIAN
> +     && integer_all_onesp (arg0)
> +     && TREE_CODE (arg1) == ADDR_EXPR)
> +      {
> +     tree t = TREE_OPERAND (arg1, 0);
> +     if (TREE_CODE (t) == ARRAY_REF)
> +       {
> +         tree index = TREE_OPERAND (t, 1);
> +         t = TREE_OPERAND (t, 0);
> +         if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR)
> +           {
> +             t = TREE_OPERAND (t, 0);
> +             tree vectype = TREE_TYPE (t);
> +             if (VECTOR_TYPE_P (vectype)
> +                 && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
> +                 && wi::to_wide (TYPE_SIZE (vectype)) == 128)
> +               {

Since this is quite a specific pattern match, and since we now lower
arm_neon.h vld1* to normal gimple accesses, I think we should try the
“more generally” approach mentioned in the PR and see what the fallout
is.  That is, keep:

    if (!BYTES_BIG_ENDIAN
        && integer_all_onesp (arg0)

If those conditions pass, create an Advanced SIMD access at address arg1,
using similar code to the handling of:

     BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
     BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD)
     BUILTIN_VALLP_NO_DI (LOAD1_P, ld1, 0, LOAD)

in aarch64_general_gimple_fold_builtin.  (Would be good to move the
common code to aarch64.c so that both files can use it.)

> +                 tree lhs = gimple_call_lhs (f.call);
> +                 tree lhs_type = TREE_TYPE (lhs);
> +                 int source_nelts = TYPE_VECTOR_SUBPARTS 
> (vectype).to_constant ();
> +                 vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), 
> source_nelts, 1);
> +                 for (int i = 0; i < source_nelts; i++)
> +                   sel.quick_push (i);
> +
> +                 vec_perm_indices indices (sel, 1, source_nelts);
> +                 if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices))
> +                   return NULL;

I don't think we need to check this: it should always be true.
Probably worth keeping as a gcc_checking_assert though.

> +
> +                 tree mask = vec_perm_indices_to_tree (lhs_type, indices);
> +                 return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask);
> +               }
> +           }
> +       }
> +      }
> +
> +    return NULL;
> +  }
>  };
>  
>  class svld1ro_impl : public load_replicate
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index f07330cff4f..af27f550be3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23002,8 +23002,32 @@ aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
>  
>    machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
>    rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
> +
>    if (d->one_vector_p)
> -    emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, sel));
> +    {
> +      bool use_dupq = false;
> +      /* Check if sel is dup vector with encoded elements {0, 1, 2, ... 
> nelts}  */
> +      if (GET_CODE (sel) == CONST_VECTOR
> +       && !GET_MODE_NUNITS (GET_MODE (sel)).is_constant ()
> +       && CONST_VECTOR_DUPLICATE_P (sel))
> +       {
> +         unsigned nelts = const_vector_encoded_nelts (sel);
> +         unsigned i;
> +         for (i = 0; i < nelts; i++)
> +           {
> +             rtx elem = CONST_VECTOR_ENCODED_ELT(sel, i);
> +             if (!(CONST_INT_P (elem) && INTVAL(elem) == i))
> +               break;
> +           }
> +         if (i == nelts)
> +           use_dupq = true;
> +       }
> +
> +      if (use_dupq)
> +     aarch64_expand_sve_dupq (d->target, GET_MODE (d->target), d->op0);
> +      else
> +     emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, sel));
> +    }

This shouldn't be a TBL but a new operation, handled by its own
aarch64_evpc_sve_* routine.  The check for the mask should then
be done on d->perm, to detect whether the permutation is one
that the new routine supports.

I think the requirements are:

- !BYTES_BIG_ENDIAN
- the source must be an Advanced SIMD vector
- the destination must be an SVE vector
- the permutation must be a duplicate (tested in the code above)
- the number of “patterns” in the permutation must equal the number of
  source elements
- element X of the permutation must equal X (tested in the code above)

The existing aarch64_evpc_* routines expect the source and target modes
to be the same, so we should only call them when that's true.

Thanks,
Richard

Reply via email to