Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics

Richard Sandiford Wed, 13 Dec 2023 06:38:29 -0800

Richard Ball <richard.b...@arm.com> writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
> gcc/ChangeLog:
>
>       * config.gcc: Adds new header to config.
>       * config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
>       Moved to header file.
>       (ENTRY): Likewise.
>       (enum aarch64_simd_type): Likewise.
>       (struct aarch64_simd_type_info): Remove static.
>       (GTY): Likewise.
>       * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
>       Defines pragma for arm_neon_sve_bridge.h.
>       * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
>       * config/aarch64/aarch64-sve-builtins-base.cc
>       (class svget_neonq_impl): New intrinsic implementation.
>       (class svset_neonq_impl): Likewise.
>       (class svdup_neonq_impl): Likewise.
>       (NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
>       * config/aarch64/aarch64-sve-builtins-functions.h
>       (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
>       functions.
>       * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
>       * config/aarch64/aarch64-sve-builtins-shapes.cc
>       (parse_element_type): Add NEON element types.
>       (parse_type): Likewise.
>       (struct get_neonq_def): Defines function shape for get_neonq.
>       (struct set_neonq_def): Defines function shape for set_neonq.
>       (struct dup_neonq_def): Defines function shape for dup_neonq.
>       * config/aarch64/aarch64-sve-builtins.cc 
>       (DEF_SVE_TYPE_SUFFIX): Changed to be called through
>       SVE_NEON macro.
>       (DEF_SVE_NEON_TYPE_SUFFIX): Defines 
>         macro for NEON_SVE_BRIDGE type suffixes.
>       (DEF_NEON_SVE_FUNCTION): Defines 
>         macro for NEON_SVE_BRIDGE functions.
>       (function_resolver::infer_neon128_vector_type): Infers type suffix
>       for overloaded functions.
>       (init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO.
>       (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
>       * config/aarch64/aarch64-sve-builtins.def
>       (DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
>       (bf16): Replace entry with neon-sve entry.
>       (f16): Likewise.
>       (f32): Likewise.
>       (f64): Likewise.
>       (s8): Likewise.
>       (s16): Likewise.
>       (s32): Likewise.
>       (s64): Likewise.
>       (u8): Likewise.
>       (u16): Likewise.
>       (u32): Likewise.
>       (u64): Likewise.
>       * config/aarch64/aarch64-sve-builtins.h
>       (GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
>       (ENTRY): Add aarch64_simd_type definiton.
>       (enum aarch64_simd_type): Add neon information to type_suffix_info.
>       (struct type_suffix_info): New function.
>       * config/aarch64/aarch64-sve.md
>       (@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
>       (@aarch64_sve_set_neonq_<mode>): Likewise.
>       * config/aarch64/aarch64.cc 
>       (aarch64_init_builtins): Add call to init_neon_sve_builtins.
>       * config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
>       * config/aarch64/aarch64-builtins.h: New file.
>       * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
>       * config/aarch64/arm_neon_sve_bridge.h: New file.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include 
>       arm_neon_sve_bridge header file
>       * gcc.dg/torture/neon-sve-bridge.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
>       * gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
>       * gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
>       * gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.


Thanks, looks great.  OK for trunk with the trivial changes below.
No need to repost unless you want to.

> [...]
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 
> 4e5a88aa03a994e42f6b2528c44547410390b26c..a7766a7d468cdeccdde8907632b3f702969f4bd7
>  100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -44,6 +44,7 @@
>  #include "aarch64-sve-builtins-shapes.h"
>  #include "aarch64-sve-builtins-base.h"
>  #include "aarch64-sve-builtins-functions.h"
> +#include "aarch64-builtins.h"
>  #include "ssa.h"
>  #include "gimple-fold.h"
>  
> @@ -1099,6 +1100,116 @@ public:
>    }
>  };
>  
> +class svget_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return NULL;
> +    tree rhs_sve_vector = gimple_call_arg (f.call, 0);
> +    tree rhs_vector = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs),
> +                          rhs_sve_vector, bitsize_int (128), bitsize_int 
> (0));
> +    return gimple_build_assign (f.lhs, rhs_vector);
> +  }
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +     machine_mode mode = e.vector_mode (0);
> +     insn_code icode = code_for_aarch64_sve_get_neonq (mode);
> +     unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode);
> +     rtx indices = aarch64_gen_stepped_int_parallel
> +       (nunits, nunits - 1 , -1);

Formatting nit, should be: (nunits, nunits - 1, -1);
(with no space before the comma)

> +
> +     e.add_output_operand (icode);
> +     e.add_input_operand (icode, e.args[0]);
> +     e.add_fixed_operand (indices);
> +     return e.generate_insn (icode);
> +      }
> +    return simplify_gen_subreg (e.result_mode (), e.args[0],
> +                             GET_MODE (e.args[0]), 0);
> +  }
> +};
> +
> +class svset_neonq_impl : public function_base
> +{
> +public:
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    machine_mode mode = e.vector_mode (0);
> +    rtx_vector_builder builder (VNx16BImode, 16, 2);
> +    for (unsigned int i = 0; i < 16; i++)
> +      builder.quick_push (CONST1_RTX (BImode));
> +    for (unsigned int i = 0; i < 16; i++)
> +      builder.quick_push (CONST0_RTX (BImode));
> +    e.args.quick_push (builder.build ());
> +    if (BYTES_BIG_ENDIAN)
> +      return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode));
> +    insn_code icode = code_for_vcond_mask (mode, mode);
> +    e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
> +    e.add_output_operand (icode);
> +    e.add_input_operand (icode, e.args[1]);
> +    e.add_input_operand (icode, e.args[0]);
> +    e.add_input_operand (icode, e.args[2]);
> +    return e.generate_insn (icode);
> +  }
> +};
> +
> +class svdup_neonq_impl : public function_base
> +{
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +     return NULL;
> +      }

Formatting nit, should just be:

    if (BYTES_BIG_ENDIAN)
      return NULL;

without the braces.

> +    tree rhs_vector = gimple_call_arg (f.call, 0);
> +    unsigned HOST_WIDE_INT neon_nelts
> +      = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs_vector)).to_constant ();
> +    poly_uint64 sve_nelts;
> +    sve_nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs));

It doesn't seem necessary to split this over two lines.  Seems simpler as:

    poly_uint64 sve_nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs));

> +    vec_perm_builder builder (sve_nelts, neon_nelts, 1);
> +    for (unsigned int i = 0; i < neon_nelts; i++)
> +      {
> +     builder.quick_push (i);
> +      }

Formatting nit, should be:

    for (unsigned int i = 0; i < neon_nelts; i++)
      builder.quick_push (i);

> +    vec_perm_indices indices (builder, 1, neon_nelts);
> +    tree perm_type = build_vector_type (ssizetype, sve_nelts);
> +    return gimple_build_assign (f.lhs, VEC_PERM_EXPR,
> +                             rhs_vector,
> +                             rhs_vector,
> +                             vec_perm_indices_to_tree (perm_type, indices));
> +  }
> +  rtx

Formatting nit, missing blank line between functions.

> +  expand (function_expander &e) const override
> +  {
> +    insn_code icode;

Very minor, but there doesn't seem any need to forward-declare this
variable.  Can just be:

> +    machine_mode mode = e.vector_mode (0);
> +    if (BYTES_BIG_ENDIAN)
> +      {
> +     icode = code_for_aarch64_vec_duplicate_vq_be (mode);

      insn_code icode = code_for_aarch64_vec_duplicate_vq_be (mode);

here and similarly below.

> +     unsigned int nunits = 128 / GET_MODE_UNIT_BITSIZE (mode);
> +     rtx indices = aarch64_gen_stepped_int_parallel
> +       (nunits, nunits - 1 , -1);

Should be: (nunits, nunits - 1, -1);

> +
> +     e.add_output_operand (icode);
> +     e.add_input_operand (icode, e.args[0]);
> +     e.add_fixed_operand (indices);
> +     return e.generate_insn (icode);
> +      }
> +    icode = code_for_aarch64_vec_duplicate_vq_le (mode);
> +    e.add_output_operand (icode);
> +    e.add_input_operand (icode, e.args[0]);
> +    return e.generate_insn (icode);
> +  }
> +};
> +
>  class svindex_impl : public function_base
>  {
>  public:
> @@ -3122,5 +3233,8 @@ FUNCTION (svzip1q, unspec_based_function, 
> (UNSPEC_ZIP1Q, UNSPEC_ZIP1Q,
>  FUNCTION (svzip2, svzip_impl, (1))
>  FUNCTION (svzip2q, unspec_based_function, (UNSPEC_ZIP2Q, UNSPEC_ZIP2Q,
>                                          UNSPEC_ZIP2Q))
> +NEON_SVE_BRIDGE_FUNCTION (svget_neonq, svget_neonq_impl,)
> +NEON_SVE_BRIDGE_FUNCTION (svset_neonq, svset_neonq_impl,)
> +NEON_SVE_BRIDGE_FUNCTION (svdup_neonq, svdup_neonq_impl,)
>  
>  } /* end namespace aarch64_sve */
> [...]
> @@ -2092,6 +2109,35 @@ function_resolver::infer_integer_vector_type (unsigned 
> int argno)
>    return type;
>  }
>  
> +/* Require argument ARGNO to have some form of NEON128 vector type.  Return 
> the
> +   associated type suffix on success.
> +   Report an error and return NUM_TYPE_SUFFIXES on failure.  */
> +type_suffix_index
> +function_resolver::infer_neon128_vector_type (unsigned int argno)
> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return NUM_TYPE_SUFFIXES;
> +
> +  for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
> +    {
> +      int neon_index = type_suffixes[suffix_i].neon128_type;
> +      if (neon_index != ARM_NEON_H_TYPES_LAST)
> +     {
> +       tree type = aarch64_simd_types[neon_index].itype;
> +       if (type && matches_type_p (type, actual))
> +         {
> +           return type_suffix_index (suffix_i);
> +         }

Formatting, should be:

          if (type && matches_type_p (type, actual))
            return type_suffix_index (suffix_i);

> +     }
> +    }
> +
> +  error_at (location, "passing %qT to argument %d of %qE, which"
> +         " expects a 128 bit NEON vector type", actual, argno + 1, fndecl);
> +  return NUM_TYPE_SUFFIXES;
> +}
> +
> +
>  /* Like infer_vector_type, but also require the type to be an unsigned
>     integer.  */
>  type_suffix_index
> @@ -4457,6 +4503,14 @@ init_builtins ()
>      }
>  }
>  
> +/* Initialize the SVE-NEON Bridge at start-up, if LTO is required.  */
> +void
> +init_neon_sve_builtins ()
> +{
> +  if (in_lto_p)
> +    handle_arm_neon_sve_bridge_h ();
> +}
> +

I think the existing in_lto_p if statement in init_builtins should do this,
rather than adding a new "external" function that aarch64_init_builtins
has to call.

Thanks,
Richard

Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics

Reply via email to