On December 12, 2019 4:10:33 PM GMT+01:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >One problem with adding an N-bit vector extension to an existing >architecture is to decide how N-bit vectors should be passed to >functions and returned from functions. Allowing all N-bit vector >types to be passed in registers breaks backwards compatibility, >since N-bit vectors could be used (and emulated) before the vector >extension was added. But always passing N-bit vectors on the >stack would be inefficient for things like vector libm functions. > >For SVE we took the compromise position of predefining new SVE vector >types that are distinct from all existing vector types, including >GNU-style vectors. The new types are passed and returned in an >efficient way while existing vector types are passed and returned >in the traditional way. In the right circumstances, the two types >are inter-convertible. > >The SVE types are created using: > > vectype = build_distinct_type_copy (vectype); > SET_TYPE_STRUCTURAL_EQUALITY (vectype); > TYPE_ARTIFICIAL (vectype) = 1; > >The C frontend maintains this distinction, using VIEW_CONVERT_EXPR >to convert from one type to the other. However, the distinction can >be lost during gimple, which treats two vector types with the same >mode, number of elements, and element type as equivalent. And for >most targets that's the right thing to do.
And why's that a problem? The difference appears only in the function call ABI which is determined by the function signature rather than types or modes of the actual arguments? Richard. >This patch therefore adds a hook that lets the target choose >whether such vector types are indeed equivalent. > >Note that the new tests fail for -mabi=ilp32 in the same way as other >ACLE-based tests. I'm still planning to fix that as a follow-on. > >Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > >Richard > > >2019-12-12 Richard Sandiford <richard.sandif...@arm.com> > >gcc/ > * target.def (compatible_vector_types_p): New target hook. > * hooks.h (hook_bool_const_tree_const_tree_true): Declare. > * hooks.c (hook_bool_const_tree_const_tree_true): New function. > * doc/tm.texi.in (TARGET_COMPATIBLE_VECTOR_TYPES_P): New hook. > * doc/tm.texi: Regenerate. > * gimple-expr.c: Include target.h. > (useless_type_conversion_p): Use targetm.compatible_vector_types_p. > * config/aarch64/aarch64.c (aarch64_compatible_vector_types_p): New > function. > (TARGET_COMPATIBLE_VECTOR_TYPES_P): Define. > * config/aarch64/aarch64-sve-builtins.cc >(gimple_folder::convert_pred): > Use the original predicate if it already has a suitable type. > >gcc/testsuite/ > * gcc.target/aarch64/sve/pcs/gnu_vectors_1.c: New test. > * gcc.target/aarch64/sve/pcs/gnu_vectors_2.c: Likewise. > >Index: gcc/target.def >=================================================================== >--- gcc/target.def 2019-11-30 18:48:18.531984101 +0000 >+++ gcc/target.def 2019-12-12 15:07:43.960415368 +0000 >@@ -3411,6 +3411,29 @@ must have move patterns for this mode.", > hook_bool_mode_false) > > DEFHOOK >+(compatible_vector_types_p, >+ "Return true if there is no target-specific reason for treating\n\ >+vector types @var{type1} and @var{type2} as distinct types. The >caller\n\ >+has already checked for target-independent reasons, meaning that >the\n\ >+types are known to have the same mode, to have the same number of >elements,\n\ >+and to have what the caller considers to be compatible element >types.\n\ >+\n\ >+The main reason for defining this hook is to reject pairs of types\n\ >+that are handled differently by the target's calling convention.\n\ >+For example, when a new @var{N}-bit vector architecture is added\n\ >+to a target, the target may want to handle normal @var{N}-bit\n\ >+@code{VECTOR_TYPE} arguments and return values in the same way as\n\ >+before, to maintain backwards compatibility. However, it may also\n\ >+provide new, architecture-specific @code{VECTOR_TYPE}s that are >passed\n\ >+and returned in a more efficient way. It is then important to >maintain\n\ >+a distinction between the ``normal'' @code{VECTOR_TYPE}s and the >new\n\ >+architecture-specific ones.\n\ >+\n\ >+The default implementation returns true, which is correct for most >targets.", >+ bool, (const_tree type1, const_tree type2), >+ hook_bool_const_tree_const_tree_true) >+ >+DEFHOOK > (vector_alignment, > "This hook can be used to define the alignment for a vector of type\n\ >@var{type}, in order to comply with a platform ABI. The default is >to\n\ >Index: gcc/hooks.h >=================================================================== >--- gcc/hooks.h 2019-11-04 21:13:57.727755548 +0000 >+++ gcc/hooks.h 2019-12-12 15:07:43.960415368 +0000 >@@ -45,6 +45,7 @@ extern bool hook_bool_uint_uint_mode_fal > extern bool hook_bool_uint_mode_true (unsigned int, machine_mode); > extern bool hook_bool_tree_false (tree); > extern bool hook_bool_const_tree_false (const_tree); >+extern bool hook_bool_const_tree_const_tree_true (const_tree, >const_tree); > extern bool hook_bool_tree_true (tree); > extern bool hook_bool_const_tree_true (const_tree); > extern bool hook_bool_gsiptr_false (gimple_stmt_iterator *); >Index: gcc/hooks.c >=================================================================== >--- gcc/hooks.c 2019-11-04 21:13:57.727755548 +0000 >+++ gcc/hooks.c 2019-12-12 15:07:43.960415368 +0000 >@@ -313,6 +313,12 @@ hook_bool_const_tree_false (const_tree) > } > > bool >+hook_bool_const_tree_const_tree_true (const_tree, const_tree) >+{ >+ return true; >+} >+ >+bool > hook_bool_tree_true (tree) > { > return true; >Index: gcc/doc/tm.texi.in >=================================================================== >--- gcc/doc/tm.texi.in 2019-11-30 18:48:18.523984157 +0000 >+++ gcc/doc/tm.texi.in 2019-12-12 15:07:43.956415393 +0000 >@@ -3365,6 +3365,8 @@ stack. > > @hook TARGET_VECTOR_MODE_SUPPORTED_P > >+@hook TARGET_COMPATIBLE_VECTOR_TYPES_P >+ > @hook TARGET_ARRAY_MODE > > @hook TARGET_ARRAY_MODE_SUPPORTED_P >Index: gcc/doc/tm.texi >=================================================================== >--- gcc/doc/tm.texi 2019-11-30 18:48:18.507984271 +0000 >+++ gcc/doc/tm.texi 2019-12-12 15:07:43.952415419 +0000 >@@ -4324,6 +4324,27 @@ insns involving vector mode @var{mode}. > must have move patterns for this mode. > @end deftypefn > >+@deftypefn {Target Hook} bool TARGET_COMPATIBLE_VECTOR_TYPES_P >(const_tree @var{type1}, const_tree @var{type2}) >+Return true if there is no target-specific reason for treating >+vector types @var{type1} and @var{type2} as distinct types. The >caller >+has already checked for target-independent reasons, meaning that the >+types are known to have the same mode, to have the same number of >elements, >+and to have what the caller considers to be compatible element types. >+ >+The main reason for defining this hook is to reject pairs of types >+that are handled differently by the target's calling convention. >+For example, when a new @var{N}-bit vector architecture is added >+to a target, the target may want to handle normal @var{N}-bit >+@code{VECTOR_TYPE} arguments and return values in the same way as >+before, to maintain backwards compatibility. However, it may also >+provide new, architecture-specific @code{VECTOR_TYPE}s that are passed >+and returned in a more efficient way. It is then important to >maintain >+a distinction between the ``normal'' @code{VECTOR_TYPE}s and the new >+architecture-specific ones. >+ >+The default implementation returns true, which is correct for most >targets. >+@end deftypefn >+ >@deftypefn {Target Hook} opt_machine_mode TARGET_ARRAY_MODE >(machine_mode @var{mode}, unsigned HOST_WIDE_INT @var{nelems}) > Return the mode that GCC should use for an array that has > @var{nelems} elements, with each element having mode @var{mode}. >Index: gcc/gimple-expr.c >=================================================================== >--- gcc/gimple-expr.c 2019-10-08 09:23:31.902529513 +0100 >+++ gcc/gimple-expr.c 2019-12-12 15:07:43.956415393 +0000 >@@ -37,6 +37,7 @@ Software Foundation; either version 3, o > #include "tree-pass.h" > #include "stringpool.h" > #include "attribs.h" >+#include "target.h" > > /* ----- Type related ----- */ > >@@ -147,10 +148,12 @@ useless_type_conversion_p (tree outer_ty > > /* Recurse for vector types with the same number of subparts. */ > else if (TREE_CODE (inner_type) == VECTOR_TYPE >- && TREE_CODE (outer_type) == VECTOR_TYPE >- && TYPE_PRECISION (inner_type) == TYPE_PRECISION (outer_type)) >- return useless_type_conversion_p (TREE_TYPE (outer_type), >- TREE_TYPE (inner_type)); >+ && TREE_CODE (outer_type) == VECTOR_TYPE) >+ return (known_eq (TYPE_VECTOR_SUBPARTS (inner_type), >+ TYPE_VECTOR_SUBPARTS (outer_type)) >+ && useless_type_conversion_p (TREE_TYPE (outer_type), >+ TREE_TYPE (inner_type)) >+ && targetm.compatible_vector_types_p (inner_type, outer_type)); > > else if (TREE_CODE (inner_type) == ARRAY_TYPE > && TREE_CODE (outer_type) == ARRAY_TYPE) >Index: gcc/config/aarch64/aarch64.c >=================================================================== >--- gcc/config/aarch64/aarch64.c 2019-12-10 16:45:56.338226712 +0000 >+++ gcc/config/aarch64/aarch64.c 2019-12-12 15:07:43.940415503 +0000 >@@ -2120,6 +2120,20 @@ aarch64_fntype_abi (const_tree fntype) > return default_function_abi; > } > >+/* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P. */ >+ >+static bool >+aarch64_compatible_vector_types_p (const_tree type1, const_tree type2) >+{ >+ unsigned int num_zr1 = 0, num_pr1 = 0, num_zr2 = 0, num_pr2 = 0; >+ if (aarch64_sve_argument_p (type1, &num_zr1, &num_pr1) >+ != aarch64_sve_argument_p (type2, &num_zr2, &num_pr2)) >+ return false; >+ >+ gcc_assert (num_zr1 == num_zr2 && num_pr1 == num_pr2); >+ return true; >+} >+ > /* Return true if we should emit CFI for register REGNO. */ > > static bool >@@ -22031,6 +22045,9 @@ #define TARGET_USE_BLOCKS_FOR_CONSTANT_P > #undef TARGET_VECTOR_MODE_SUPPORTED_P > #define TARGET_VECTOR_MODE_SUPPORTED_P aarch64_vector_mode_supported_p > >+#undef TARGET_COMPATIBLE_VECTOR_TYPES_P >+#define TARGET_COMPATIBLE_VECTOR_TYPES_P >aarch64_compatible_vector_types_p >+ > #undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT > #define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \ > aarch64_builtin_support_vector_misalignment >Index: gcc/config/aarch64/aarch64-sve-builtins.cc >=================================================================== >--- gcc/config/aarch64/aarch64-sve-builtins.cc 2019-12-06 >18:22:12.072859530 +0000 >+++ gcc/config/aarch64/aarch64-sve-builtins.cc 2019-12-12 >15:07:43.936415528 +0000 >@@ -2251,9 +2251,13 @@ tree > gimple_folder::convert_pred (gimple_seq &stmts, tree vectype, > unsigned int argno) > { >- tree predtype = truth_type_for (vectype); > tree pred = gimple_call_arg (call, argno); >- return gimple_build (&stmts, VIEW_CONVERT_EXPR, predtype, pred); >+ if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (pred)), >+ TYPE_VECTOR_SUBPARTS (vectype))) >+ return pred; >+ >+ return gimple_build (&stmts, VIEW_CONVERT_EXPR, >+ truth_type_for (vectype), pred); > } > > /* Return a pointer to the address in a contiguous load or store, >Index: gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c >=================================================================== >--- /dev/null 2019-09-17 11:41:18.176664108 +0100 >+++ gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c 2019-12-12 >15:07:43.972415287 +0000 >@@ -0,0 +1,99 @@ >+/* { dg-options "-O -msve-vector-bits=256 -fomit-frame-pointer" } */ >+ >+#include <arm_sve.h> >+ >+typedef float16_t float16x16_t __attribute__((vector_size (32))); >+typedef float32_t float32x8_t __attribute__((vector_size (32))); >+typedef float64_t float64x4_t __attribute__((vector_size (32))); >+typedef int8_t int8x32_t __attribute__((vector_size (32))); >+typedef int16_t int16x16_t __attribute__((vector_size (32))); >+typedef int32_t int32x8_t __attribute__((vector_size (32))); >+typedef int64_t int64x4_t __attribute__((vector_size (32))); >+typedef uint8_t uint8x32_t __attribute__((vector_size (32))); >+typedef uint16_t uint16x16_t __attribute__((vector_size (32))); >+typedef uint32_t uint32x8_t __attribute__((vector_size (32))); >+typedef uint64_t uint64x4_t __attribute__((vector_size (32))); >+ >+void float16_callee (float16x16_t); >+void float32_callee (float32x8_t); >+void float64_callee (float64x4_t); >+void int8_callee (int8x32_t); >+void int16_callee (int16x16_t); >+void int32_callee (int32x8_t); >+void int64_callee (int64x4_t); >+void uint8_callee (uint8x32_t); >+void uint16_callee (uint16x16_t); >+void uint32_callee (uint32x8_t); >+void uint64_callee (uint64x4_t); >+ >+void >+float16_caller (void) >+{ >+ float16_callee (svdup_f16 (1.0)); >+} >+ >+void >+float32_caller (void) >+{ >+ float32_callee (svdup_f32 (2.0)); >+} >+ >+void >+float64_caller (void) >+{ >+ float64_callee (svdup_f64 (3.0)); >+} >+ >+void >+int8_caller (void) >+{ >+ int8_callee (svindex_s8 (0, 1)); >+} >+ >+void >+int16_caller (void) >+{ >+ int16_callee (svindex_s16 (0, 2)); >+} >+ >+void >+int32_caller (void) >+{ >+ int32_callee (svindex_s32 (0, 3)); >+} >+ >+void >+int64_caller (void) >+{ >+ int64_callee (svindex_s64 (0, 4)); >+} >+ >+void >+uint8_caller (void) >+{ >+ uint8_callee (svindex_u8 (1, 1)); >+} >+ >+void >+uint16_caller (void) >+{ >+ uint16_callee (svindex_u16 (1, 2)); >+} >+ >+void >+uint32_caller (void) >+{ >+ uint32_callee (svindex_u32 (1, 3)); >+} >+ >+void >+uint64_caller (void) >+{ >+ uint64_callee (svindex_u64 (1, 4)); >+} >+ >+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7], >\[x0\]} 2 } } */ >+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7], >\[x0\]} 3 } } */ >+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], >\[x0\]} 3 } } */ >+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], >\[x0\]} 3 } } */ >+/* { dg-final { scan-assembler-times {\tadd\tx0, sp, #?16\n} 11 } } */ >Index: gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c >=================================================================== >--- /dev/null 2019-09-17 11:41:18.176664108 +0100 >+++ gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c 2019-12-12 >15:07:43.972415287 +0000 >@@ -0,0 +1,99 @@ >+/* { dg-options "-O -msve-vector-bits=256 -fomit-frame-pointer" } */ >+ >+#include <arm_sve.h> >+ >+typedef float16_t float16x16_t __attribute__((vector_size (32))); >+typedef float32_t float32x8_t __attribute__((vector_size (32))); >+typedef float64_t float64x4_t __attribute__((vector_size (32))); >+typedef int8_t int8x32_t __attribute__((vector_size (32))); >+typedef int16_t int16x16_t __attribute__((vector_size (32))); >+typedef int32_t int32x8_t __attribute__((vector_size (32))); >+typedef int64_t int64x4_t __attribute__((vector_size (32))); >+typedef uint8_t uint8x32_t __attribute__((vector_size (32))); >+typedef uint16_t uint16x16_t __attribute__((vector_size (32))); >+typedef uint32_t uint32x8_t __attribute__((vector_size (32))); >+typedef uint64_t uint64x4_t __attribute__((vector_size (32))); >+ >+void float16_callee (svfloat16_t); >+void float32_callee (svfloat32_t); >+void float64_callee (svfloat64_t); >+void int8_callee (svint8_t); >+void int16_callee (svint16_t); >+void int32_callee (svint32_t); >+void int64_callee (svint64_t); >+void uint8_callee (svuint8_t); >+void uint16_callee (svuint16_t); >+void uint32_callee (svuint32_t); >+void uint64_callee (svuint64_t); >+ >+void >+float16_caller (float16x16_t arg) >+{ >+ float16_callee (arg); >+} >+ >+void >+float32_caller (float32x8_t arg) >+{ >+ float32_callee (arg); >+} >+ >+void >+float64_caller (float64x4_t arg) >+{ >+ float64_callee (arg); >+} >+ >+void >+int8_caller (int8x32_t arg) >+{ >+ int8_callee (arg); >+} >+ >+void >+int16_caller (int16x16_t arg) >+{ >+ int16_callee (arg); >+} >+ >+void >+int32_caller (int32x8_t arg) >+{ >+ int32_callee (arg); >+} >+ >+void >+int64_caller (int64x4_t arg) >+{ >+ int64_callee (arg); >+} >+ >+void >+uint8_caller (uint8x32_t arg) >+{ >+ uint8_callee (arg); >+} >+ >+void >+uint16_caller (uint16x16_t arg) >+{ >+ uint16_callee (arg); >+} >+ >+void >+uint32_caller (uint32x8_t arg) >+{ >+ uint32_callee (arg); >+} >+ >+void >+uint64_caller (uint64x4_t arg) >+{ >+ uint64_callee (arg); >+} >+ >+/* { dg-final { scan-assembler-times {\tld1b\tz0\.b, p[0-7]/z, \[x0\]} >2 } } */ >+/* { dg-final { scan-assembler-times {\tld1h\tz0\.h, p[0-7]/z, \[x0\]} >3 } } */ >+/* { dg-final { scan-assembler-times {\tld1w\tz0\.s, p[0-7]/z, \[x0\]} >3 } } */ >+/* { dg-final { scan-assembler-times {\tld1d\tz0\.d, p[0-7]/z, \[x0\]} >3 } } */ >+/* { dg-final { scan-assembler-not {\tst1[bhwd]\t} } } */