"Andre Vieira (lists) via Gcc-patches" <[email protected]> writes:
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't before... I
>>>> thought I could handle the vector forms later, but the problem is if I add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, which
>>>> doesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was
> initially trying to make vectorizable_call and
> vectorizable_internal_function handle IFNs with different inputs more
> generically, using the information we have in the <IFN>_direct structs
> regarding what operands to get the modes from. Unfortunately, that
> wasn't straightforward because of how vectorizable_call assumes operands
> have the same type and uses the type of the DEF_STMT_INFO of the
> non-constant operands (either output operand or non-constant inputs) to
> determine the type of constants. I assume there is some reason why we
> use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on
> the argument types. That is why I ended up with this sort of half-way
> mix of both, which still allows room to add more IFNs that don't take
> inputs of the same type, but require adding a bit of special casing
> similar to the IFN_FTRUNC_INT and masking ones.
>
> Bootstrapped on aarch64-none-linux.
Still leaving the match.pd stuff to Richard, but otherwise:
> index
> bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f
> 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF
> "TARGET_SIMD_F16INST")
> SF DF])
>
> ;; Scalar and vetor modes for SF, DF.
> -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
> +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")
Nit: excess space between [ and (.
> + (V4SF "TARGET_SIMD")
> + (V2DF "TARGET_SIMD")
> + (DF "TARGET_FLOAT")
> + (SF "TARGET_FLOAT")])
>
> ;; Advanced SIMD single Float modes.
> (define_mode_iterator VDQSF [V2SF V4SF])
> […]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index
> 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c
> 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
>
> This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar
> or
> +vector floating-point mode. Exception must be thrown if operand 1 does not
> fit
Maybe “An exception must be raised”? “thrown” makes it sound like a
signal must be raised or C++ exception thrown.
> +in a @var{n} mode signed integer as it would have if the truncation happened
> +through separate floating point to integer conversion.
> +
> +
> @cindex @code{round@var{m}2} instruction pattern
> @item @samp{round@var{m}2}
> Round operand 1 to the nearest integer, rounding away from zero in the
> […]
> @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab,
> tree_pair types,
> != CODE_FOR_nothing);
> }
>
> +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
> + tree_pair types,
> + optimization_type opt_type)
Formatting nit: should be a line break after “bool”
> +{
> + return (convert_optab_handler (optab, TYPE_MODE (types.first),
> + TYPE_MODE (element_type (types.second)),
> + opt_type) != CODE_FOR_nothing);
> +}
> +
> #define direct_unary_optab_supported_p direct_optab_supported_p
> #define direct_binary_optab_supported_p direct_optab_supported_p
> #define direct_ternary_optab_supported_p direct_optab_supported_p
> […]
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)
> \
> +void \
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n) \
> +{ \
> + for (int i = 0; i < n; ++i) \
> + { \
> + int_type x_i = x[i]; \
> + y[i] = (float_type) x_i; \
> + } \
> +}
> +
> +/*
> +** f1:
> +** ...
> +** frint32z v0.4s, v0.4s
I don't think we can rely on v0 being used here. v[0-9]+\.4s would
be safer.
> +** ...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +** ...
> +** frint64z v0.4s, v0.4s
> +** ...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +** ...
> +** frint32z v0.2d, v0.2d
> +** ...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +** ...
> +** frint64z v0.2d, v0.2d
> +** ...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index
> 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62
> 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>
> static internal_fn
> vectorizable_internal_function (combined_fn cfn, tree fndecl,
> - tree vectype_out, tree vectype_in)
> + tree vectype_out, tree vectype_in,
> + tree *vectypes)
> {
> internal_fn ifn;
> if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree
> fndecl,
> const direct_internal_fn_info &info = direct_internal_fn (ifn);
> if (info.vectorizable)
> {
> - tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> - tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> + tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> + if (!type0)
> + type0 = vectype_in;
> + tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> + if (!type1)
> + type1 = vectype_in;
> if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
> OPTIMIZE_FOR_SPEED))
> return ifn;
> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
> rhs_type = unsigned_type_node;
> }
>
> - int mask_opno = -1;
> + /* The argument that is not of the same type as the others. */
> + int diff_opno = -1;
> + bool masked = false;
> if (internal_fn_p (cfn))
> - mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> + {
> + if (cfn == CFN_FTRUNC_INT)
> + /* For FTRUNC this represents the argument that carries the type of the
> + intermediate signed integer. */
> + diff_opno = 1;
> + else
> + {
> + /* For masked operations this represents the argument that carries the
> + mask. */
> + diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> + masked = diff_opno >= 0;
> + }
> + }
I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT. There's no particular need to vectorise it.
> for (i = 0; i < nargs; i++)
> {
> - if ((int) i == mask_opno)
> + if ((int) i == diff_opno && masked)
> {
> - if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> - &op, &slp_op[i], &dt[i], &vectypes[i]))
> + if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> + diff_opno, &op, &slp_op[i], &dt[i],
> + &vectypes[i]))
> return false;
> continue;
> }
> […]
> diff --git a/gcc/tree.c b/gcc/tree.c
> index
> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
> 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
> cst_size_error *perr /* = NULL */)
> return true;
> }
>
> -/* Return the precision of the type, or for a complex or vector type the
> - precision of the type of its elements. */
> +/* Return the type, or for a complex or vector type the type of its
> + elements. */
>
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (const_tree type)
> {
> if (!TYPE_P (type))
> type = TREE_TYPE (type);
> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
> if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> type = TREE_TYPE (type);
>
> - return TYPE_PRECISION (type);
> + return (tree) type;
I think we should stick a const_cast in element_precision and make
element_type take a plain “type”. As it stands element_type is an
implicit const_cast for many cases.
Thanks,
Richard
> +}
> +
> +/* Return the precision of the type, or for a complex or vector type the
> + precision of the type of its elements. */
> +
> +unsigned int
> +element_precision (const_tree type)
> +{
> + return TYPE_PRECISION (element_type (type));
> }
>
> /* Return true if CODE represents an associative tree code. Otherwise