> -----Original Message----- > From: Alex Coplan <[email protected]> > Sent: 20 October 2025 10:17 > To: Tamar Christina <[email protected]> > Cc: [email protected]; nd <[email protected]>; Richard Earnshaw > <[email protected]>; [email protected]; > [email protected]; Wilco Dijkstra > <[email protected]>; Alice Carlotti <[email protected]> > Subject: Re: [PATCH]AArch64: Extend intrinsics framework to account for > merging predications without gp [PR121604] > > Hi Tamar, > > On 16/10/2025 14:02, Alex Coplan wrote: > > Hi Tamar, > > > > Thanks for doing this, and sorry for not getting round to doing it > > sooner myself. This generally looks really good. Some comments below. > > > > On 14/10/2025 16:31, Tamar Christina wrote: > > > In PR121604 the problem was noted that currently the SVE intrinsics > > > infrastructure assumes that for any predicated operation that the GP is at > the > > > first argument position which has a svbool_t or for a unary merging > operation > > > that it's in the second position. > > > > > > However you have intrinsics like fmov_lane which have an svbool_t but it's > not > > > > I guess you mean pmov_lane here? Just figured it's worth mentioning so > that we > > don't end up with a misleading commit message. > > I see this has now landed as > g:d1965b1fd8938f35f78be503e36b98b406751e21 > (thanks), but it looks like you missed making these tweaks to the commit > message. Of course not much to be done now, just one to bear in mind > for next time there are comments on the cover letter. >
Hi Alex, I simply forgot to update the one in the cover letter but corrected the rest. Tamar > Cheers, > Alex > > > > > > a GP but instead it's the inactive lanes. > > > > I think for svpmov_lane intrinsics like: > > > > svuint16_t svpmov_lane[_u16]_m(svuint16_t zd, svbool_t pn, uint64_t > imm); > > > > the svbool_t operand isn't the inactive value, but just a non-governing > > predicate operand (in this case, the predicate to move destructively > > into the vector result). > > > > > > > > You also have instructions like BRKB which work only on predicates so it > > > incorrectly determines the first operand to be the GP, while that's also > > > the > > > inactive lanes. > > > > > > However during apply_predication we do have the information about > where the GP > > > is. This patch re-organizes the code to record this information into the > > > function_instance such that folders have access to this information. > > > > > > For functions that are outliers like pmov_lane we can now override the > > > availability of the intrinsics having a GP. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication): > > > Store gp_index. > > > (struct pmov_to_vector_lane_def): Mark instruction as has no GP. > > > * config/aarch64/aarch64-sve-builtins.h > (function_instance::gp_value, > > > function_instance::inactive_values, function_instance::gp_index, > > > function_shape::has_gp_argument_p): New. > > > * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::fold_pfalse): > > > Simplify code and use GP helpers. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/sve/pr121604_brk.c: New test. > > > * gcc.target/aarch64/sve2/pr121604_pmov.c: New test. > > > > > > Co-authored-by: Jennifer Schmitz <[email protected]> > > > > > > --- > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > > > index > 74a3338e9556c4369a3b1e6e864acfba4ced0e3a..0ee60e89aa6905ec5a281 > a017fde1c8963a03466 100644 > > > --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc > > > @@ -21,7 +21,10 @@ > > > #include "system.h" > > > #include "coretypes.h" > > > #include "tm.h" > > > +#include "basic-block.h" > > > #include "tree.h" > > > +#include "function.h" > > > +#include "gimple.h" > > > #include "rtl.h" > > > #include "tm_p.h" > > > #include "memmodel.h" > > > @@ -68,23 +71,36 @@ za_group_is_pure_overload (const > function_group_info &group) > > > types in ARGUMENT_TYPES. RETURN_TYPE is the type returned by the > > > function. */ > > > static void > > > -apply_predication (const function_instance &instance, tree return_type, > > > +apply_predication (function_instance &instance, tree return_type, > > > vec<tree> &argument_types) > > > { > > > + /* Initially mark the function as not being predicated. */ > > > + instance.gp_index = -1; > > > + > > > /* There are currently no SME ZA instructions that have both merging > > > and > > > unpredicated forms, so for simplicity, the predicates are always > > > included > > > in the original format string. */ > > > if (instance.pred != PRED_none && instance.pred != PRED_za_m) > > > { > > > argument_types.quick_insert (0, instance.gp_type ()); > > > + instance.gp_index = 0; > > > /* For unary merge operations, the first argument is a vector with > > > the same type as the result. For unary_convert_narrowt it also > > > provides the "bottom" half of active elements, and is present > > > for all types of predication. */ > > > auto nargs = argument_types.length () - 1; > > > if (instance.shape->has_merge_argument_p (instance, nargs)) > > > - argument_types.quick_insert (0, return_type); > > > + { > > > + argument_types.quick_insert (0, return_type); > > > + instance.gp_index = 1; > > > + } > > > } > > > + > > > + /* Check if the shape has overriden the GP argument. In this case > register > > > + the types, predicate the function but don't mark it as having a GP > > > and > > > + override what was set before. */ > > > > How about changing the second sentence to something like: > > > > /* In this case the predicate type we added above is a non-governing > > predicate operand (and there is no GP), so update the gp_index value > > accordingly. */ > > > > ? That makes it slightly clearer to me why it's correct to register the > > types but clear the gp_index field. > > > > > + if (!instance.shape->has_gp_argument_p (instance)) > > > + instance.gp_value_index = -1; > > > > I suppose this should be: > > > > instance.gp_index = -1; > > > > ? Did you send the right version of the patch? It seems like the patch > > wouldn't bootstrap as is. > > > > > } > > > > > > /* Parse and move past an element type in FORMAT and return it as a type > > > @@ -3332,6 +3348,14 @@ struct pmov_to_vector_lane_def : public > overloaded_base<0> > > > but it doesn't currently have the necessary information. */ > > > return c.require_immediate_range (1, 1, bytes - 1); > > > } > > > + > > > + /* This function has a predicate argument, and is a merging > > > instruction, > but > > > + the predicate is not a GP, it's the inactive lanes. */ > > > > As per my comment on the cover letter, for pmov_lane I think the > > predicate argument is just the (non-governing) predicate value to move > > into the vector result (not an inactive value). > > > > > + bool > > > + has_gp_argument_p (const function_instance &) const override > > > + { > > > + return false; > > > + } > > > }; > > > SHAPE (pmov_to_vector_lane) > > > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > > > index > 4956e364929c6a958f37f4d5ab8af668682225a8..4805537231227c565ff2a > 5ef32f30b24017fb3ad 100644 > > > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > > > @@ -3632,24 +3632,22 @@ gimple_folder::redirect_pred_x () > > > gimple * > > > gimple_folder::fold_pfalse () > > > { > > > - if (pred == PRED_none) > > > + tree gp = gp_pred (call); > > > > s/gp_pred/gp_value/ ? > > > > > + /* If there isn't a GP then we can't do any folding as the instruction > > > isn't > > > + predicated. */ > > > + if (!gp) > > > return nullptr; > > > - tree arg0 = gimple_call_arg (call, 0); > > > + > > > if (pred == PRED_m) > > > { > > > - /* Unary function shapes with _m predication are folded to the > > > - inactive vector (arg0), while other function shapes are folded > > > - to op1 (arg1). */ > > > - tree arg1 = gimple_call_arg (call, 1); > > > - if (is_pfalse (arg1)) > > > - return fold_call_to (arg0); > > > - if (is_pfalse (arg0)) > > > - return fold_call_to (arg1); > > > + tree val = inactive_values (call); > > > + if (is_pfalse (gp)) > > > + return fold_call_to (val); > > > return nullptr; > > > } > > > - if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) > > > + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (gp)) > > > return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > > > - if (pred == PRED_implicit && is_pfalse (arg0)) > > > + if (pred == PRED_implicit && is_pfalse (gp)) > > > { > > > unsigned int flags = call_properties (); > > > /* Folding to lhs = {0, ...} is not appropriate for intrinsics with > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h > b/gcc/config/aarch64/aarch64-sve-builtins.h > > > index > d6a58b450d6ddfd38788899b61b75c14e9472a99..fceb80f4d8e3e9ab1584 > 450c98d35f94c4202c8a 100644 > > > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > > > @@ -403,6 +403,8 @@ public: > > > bool could_trap_p () const; > > > > > > vector_type_index gp_type_index () const; > > > + tree gp_value (gcall *) const; > > > + tree inactive_values (gcall *) const; > > > tree gp_type () const; > > > > > > unsigned int vectors_per_tuple () const; > > > @@ -436,6 +438,7 @@ public: > > > group_suffix_index group_suffix_id; > > > predication_index pred; > > > fpm_mode_index fpm_mode; > > > + int gp_index; > > > }; > > > > > > class registered_function; > > > @@ -801,6 +804,8 @@ public: > > > virtual bool has_merge_argument_p (const function_instance &, > > > unsigned int) const; > > > > > > + virtual bool has_gp_argument_p (const function_instance &) const; > > > + > > > virtual bool explicit_type_suffix_p (unsigned int) const = 0; > > > > > > /* True if the group suffix is present in overloaded names. > > > @@ -949,6 +954,34 @@ function_instance::gp_type () const > > > return acle_vector_types[0][gp_type_index ()]; > > > } > > > > > > +/* Return the tree value that should be used as the governing predicate > > > of > > > + this function. If none then return NULL_TREE. */ > > > +inline tree > > > +function_instance::gp_value (gcall *call) const > > > +{ > > > + if (pred == PRED_none || gp_index < 0) > > > + return NULL_TREE; > > > > I think the pred == PRED_none check is redundant here, since we already > > check it in the default definition of has_gp_argument_p and then set > > gp_index accordingly in apply_predication. > > > > > + > > > + return gimple_call_arg (call, gp_index); > > > +} > > > + > > > +/* Return the tree value that should be used the inactive lanes should > > > this > > > > "used for the inactive lanes"? > > > > > + function be a predicated function with a gp. If none then return > > > > Nit: could just say "Otherwise return NULL_TREE" for brevity, but it's fine > > either way. > > > > > + NULL_TREE. */ > > > +inline tree > > > +function_instance::inactive_values (gcall *call) const > > > +{ > > > + if (pred == PRED_none || gp_index < 0) > > > + return NULL_TREE; > > > > Likewise, the pred == PRED_none check is redundant. > > > > > + > > > + /* Function is unary with m predicate. */ > > > + if (gp_index == 1) > > > + return gimple_call_arg (call, 0); > > > + > > > + /* Else the inactive values are the next element. */ > > > + return gimple_call_arg (call, 1); > > > +} > > > + > > > /* If the function operates on tuples of vectors, return the number > > > of vectors in the tuples, otherwise return 1. */ > > > inline unsigned int > > > @@ -1123,6 +1156,14 @@ function_shape::has_merge_argument_p > (const function_instance &instance, > > > return nargs == 1 && instance.pred == PRED_m; > > > } > > > > > > +/* Return true if INSTANCE has an predicate argument that can be used as > the global > > > + predicate. */ > > > > s/an predicate/a predicate/ and s/global/governing/. > > > > The patch LGTM with those changes, thanks! > > > > Alex > > > > > +inline bool > > > +function_shape::has_gp_argument_p (const function_instance > &instance) const > > > +{ > > > + return instance.pred != PRED_none; > > > +} > > > + > > > /* Return the mode of the result of a call. */ > > > inline machine_mode > > > function_expander::result_mode () const > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c > > > new file mode 100644 > > > index > 0000000000000000000000000000000000000000..a474a20554d30a19df > 701f89e16d10f5692b29fd > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2" } */ > > > +/* { dg-final { check-function-bodies "**" "" "" } } */ > > > + > > > +#include <arm_sve.h> > > > + > > > +/* > > > +** foo: > > > +** ptrue p0\.b, all > > > +** brkb p0\.b, p0/z, p0\.b > > > +** ret > > > +*/ > > > +svbool_t foo () { > > > + return svbrkb_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ()); > > > +} > > > + > > > +/* > > > +** bar: > > > +** ptrue p0\.b, all > > > +** brka p0\.b, p0/z, p0\.b > > > +** ret > > > +*/ > > > +svbool_t bar () { > > > + return svbrka_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ()); > > > +} > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c > b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c > > > new file mode 100644 > > > index > 0000000000000000000000000000000000000000..16844ee4add3bc0f7c6 > b251b88913bf7c29b54cf > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c > > > @@ -0,0 +1,16 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -march=armv8.2-a+sve2p1" } */ > > > +/* { dg-final { check-function-bodies "**" "" "" } } */ > > > + > > > +#include <arm_sve.h> > > > + > > > +/* > > > +** f: > > > +** pfalse p([0-7]+)\.b > > > +** mov z0\.b, #-1 > > > +** pmov z0\[1\], p\1\.d > > > +** ret > > > +*/ > > > +svuint64_t f () { > > > + return svpmov_lane_u64_m (svdup_u64 (~0UL), svpfalse (), 1); > > > +} > > > > > > > > > --
