> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Thursday, August 21, 2025 11:55 AM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org; nd > <n...@arm.com> > Subject: RE: [PATCH 2/5]middle-end: Add detection for add halfing and > narrowing > instruction > > On Wed, 20 Aug 2025, Tamar Christina wrote: > > > > -----Original Message----- > > > From: Richard Biener <richard.guent...@gmail.com> > > > Sent: Wednesday, August 20, 2025 1:48 PM > > > To: Tamar Christina <tamar.christ...@arm.com> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de > > > Subject: Re: [PATCH 2/5]middle-end: Add detection for add halfing and > narrowing > > > instruction > > > > > > On Tue, Aug 19, 2025 at 6:29 AM Tamar Christina <tamar.christ...@arm.com> > > > wrote: > > > > > > > > This adds support for detectioon of the ADDHN pattern in the vectorizer. > > > > > > > > Concretely try to detect > > > > > > > > _1 = (W)a > > > > _2 = (W)b > > > > _3 = _1 + _2 > > > > _4 = _3 >> (precision(a) / 2) > > > > _5 = (N)_4 > > > > > > > > where > > > > W = precision (a) * 2 > > > > N = precision (a) / 2 > > > > > > Hmm. Is the widening because of UB with signed overflow? The > > > actual carry of a + b doesn't end up in (N)(_3 >> (precision(a) / 2)). > > > I'd expect that for unsigned a and b you could see just > > > (N)((a + b) >> (precision(a) / 2)), no? Integer promotion would make > > > this difficult to write, of course, unless the patterns exist for SImode > > > -> HImode add-high. > > > > > > > I guess the description is inaccurate, addhn extract explicitly the high > > bits of the results. So the high bits will end up in the low part. > > > > > Also ... > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > -m32, -m64 and no issues. > > > > > > > > Ok for master? Tests in the next patch which adds the optabs to AArch64. > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > * internal-fn.def (VEC_ADD_HALFING_NARROW, > > > > IFN_VEC_ADD_HALFING_NARROW_LO, > > > IFN_VEC_ADD_HALFING_NARROW_HI, > > > > IFN_VEC_ADD_HALFING_NARROW_EVEN, > > > IFN_VEC_ADD_HALFING_NARROW_ODD): New. > > > > * internal-fn.cc (commutative_binary_fn_p): Add > > > > IFN_VEC_ADD_HALFING_NARROW, > IFN_VEC_ADD_HALFING_NARROW_LO > > > and > > > > IFN_VEC_ADD_HALFING_NARROW_EVEN. > > > > (commutative_ternary_fn_p): Add > IFN_VEC_ADD_HALFING_NARROW_HI, > > > > IFN_VEC_ADD_HALFING_NARROW_ODD. > > > > * match.pd (add_half_narrowing_p): New. > > > > * optabs.def (vec_saddh_narrow_optab, vec_saddh_narrow_hi_optab, > > > > vec_saddh_narrow_lo_optab, vec_saddh_narrow_odd_optab, > > > > vec_saddh_narrow_even_optab, vec_uaddh_narrow_optab, > > > > vec_uaddh_narrow_hi_optab, vec_uaddh_narrow_lo_optab, > > > > vec_uaddh_narrow_odd_optab, vec_uaddh_narrow_even_optab): New. > > > > * tree-vect-patterns.cc (gimple_add_half_narrowing_p): New. > > > > (vect_recog_add_halfing_narrow_pattern): New. > > > > (vect_vect_recog_func_ptrs): Use it. > > > > * doc/generic.texi: Document them. > > > > * doc/md.texi: Likewise. > > > > > > > > --- > > > > diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi > > > > index > > > > d4ac580a7a8b9cd339d26cb97f7eb963f83746a4..b32d99d4d1aad244a493d8f > > > 67b66151ff5363d0e 100644 > > > > --- a/gcc/doc/generic.texi > > > > +++ b/gcc/doc/generic.texi > > > > @@ -1834,6 +1834,11 @@ a value from @code{enum annot_expr_kind}, > the > > > third is an @code{INTEGER_CST}. > > > > @tindex IFN_VEC_WIDEN_MINUS_LO > > > > @tindex IFN_VEC_WIDEN_MINUS_EVEN > > > > @tindex IFN_VEC_WIDEN_MINUS_ODD > > > > +@tindex IFN_VEC_ADD_HALFING_NARROW > > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_HI > > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_LO > > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_EVEN > > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_ODD > > > > @tindex VEC_UNPACK_HI_EXPR > > > > @tindex VEC_UNPACK_LO_EXPR > > > > @tindex VEC_UNPACK_FLOAT_HI_EXPR > > > > @@ -1956,6 +1961,51 @@ vector of @code{N/2} subtractions. In the case > of > > > > vector are subtracted from the odd @code{N/2} of the first to produce > > > > the > > > > vector of @code{N/2} subtractions. > > > > > > > > +@item IFN_VEC_ADD_HALFING_NARROW > > > > +This internal function represents widening vector addition of two input > > > > +vectors, extracting the top half of the result and narrow that value > > > > to a type > > > > +half that of the original input. > > > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. Its > operands > > > > +are vectors that contain the same number of elements (@code{N}) of the > same > > > > +integral type. The result is a vector that contains the same amount > (@code{N}) > > > > +of elements, of an integral type whose size is twice as narrow, as the > > > > input > > > > +vectors. If the current target does not implement the corresponding > > > > optabs > the > > > > +vectorizer may choose to split it into either a pair > > > > +of @code{IFN_VEC_ADD_HALFING_NARROW_HI} and > > > @code{IFN_VEC_ADD_HALFING_NARROW_LO} > > > > +or @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} and > > > > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD}, depending on what > optabs > > > the target > > > > +implements. > > > > + > > > > +@item IFN_VEC_ADD_HALFING_NARROW_HI > > > > +@itemx IFN_VEC_ADD_HALFING_NARROW_LO > > > > +This internal function represents widening vector addition of two input > > > > +vectors, extracting the top half of the result and narrow that value > > > > to a type > > > > +half that of the original input inserting the result as the high or > > > > low half of > > > > +the result vector. > > > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. > > > > Their > > > > +operands are vectors that contain the same number of elements > (@code{N}) of > > > the > > > > +same integral type. The result is a vector that contains half as many > elements, > > > > +of an integral type whose size is twice as narrow. In the case of > > > > +@code{IFN_VEC_ADD_HALFING_NARROW_HI} the high @code{N/2} > elements > > > of the result > > > > +is inserted into the given result vector with the low elements left > > > > untouched. > > > > +The operation is a RMW. In the case of > > > @code{IFN_VEC_ADD_HALFING_NARROW_LO} the > > > > +low @code{N/2} elements of the result is used as the full result. > > > > + > > > > +@item IFN_VEC_ADD_HALFING_NARROW_EVEN > > > > +@itemx IFN_VEC_ADD_HALFING_NARROW_ODD > > > > +This internal function represents widening vector addition of two input > > > > +vectors, extracting the top half of the result and narrow that value > > > > to a type > > > > +half that of the original input inserting the result as the even or > > > > odd parts of > > > > +the result vector. > > > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}. > > > > Their > > > > +operands are vectors that contain the same number of elements > (@code{N}) of > > > the > > > > +same integral type. The result is a vector that contains half as many > elements, > > > > +of an integral type whose size is twice as narrow. In the case of > > > > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD} the odd @code{N/2} > > > elements of the result > > > > +is inserted into the given result vector with the even elements left > untouched. > > > > +The operation is a RMW. In the case of > > > @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} > > > > +the even @code{N/2} elements of the result is used as the full result. > > > > + > > > > @item VEC_UNPACK_HI_EXPR > > > > @itemx VEC_UNPACK_LO_EXPR > > > > These nodes represent unpacking of the high and low parts of the input > vector, > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > > > index > > > > aba93f606eca59d31c103a05b2567fd4f3be55f3..cb691b56f137a0037f5178ba8 > > > 53911df5a65e5a7 100644 > > > > --- a/gcc/doc/md.texi > > > > +++ b/gcc/doc/md.texi > > > > @@ -6087,6 +6087,21 @@ vectors with N signed/unsigned elements of size > > > S@. Find the absolute > > > > difference between operands 1 and 2 and widen the resulting elements. > > > > Put the N/2 results of size 2*S in the output vector (operand 0). > > > > > > > > +@cindex @code{vec_saddh_narrow_hi_@var{m}} instruction pattern > > > > +@cindex @code{vec_saddh_narrow_lo_@var{m}} instruction pattern > > > > +@cindex @code{vec_uaddh_narrow_hi_@var{m}} instruction pattern > > > > +@cindex @code{vec_uaddh_narrow_lo_@var{m}} instruction pattern > > > > +@item @samp{vec_uaddh_narrow_hi_@var{m}}, > > > @samp{vec_uaddh_narrow_lo_@var{m}} > > > > +@itemx @samp{vec_saddh_narrow_hi_@var{m}}, > > > @samp{vec_saddh_narrow_lo_@var{m}} > > > > +@item @samp{vec_uaddh_narrow_even_@var{m}}, > > > @samp{vec_uaddh_narrow_even_@var{m}} > > > > +@itemx @samp{vec_saddh_narrow_odd_@var{m}}, > > > @samp{vec_saddh_narrow_odd_@var{m}} > > > > +Signed/Unsigned widening add long extract high half and narrow. > > > > Operands > 1 > > > and > > > > +2 are vectors with N signed/unsigned elements of size S@. Add the > high/low > > > > +elements of 1 and 2 together in a widened precision, extract the top > > > > half and > > > > +narrow the result to half the size of S@ abd store the results in the > > > > output > > > > +vector (operand 0). Congretely it does > > > > +@code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)} > > > > + > > > > @cindex @code{vec_addsub@var{m}3} instruction pattern > > > > @item @samp{vec_addsub@var{m}3} > > > > Alternating subtract, add with even lanes doing subtract and odd > > > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > > > > index > > > > 83438dd2ff57474cec999adaeabe92c0540e2a51..e600dbc4b3a0b27f78be00d5 > > > 2f7f6a54a13d7241 100644 > > > > --- a/gcc/internal-fn.cc > > > > +++ b/gcc/internal-fn.cc > > > > @@ -4442,6 +4442,9 @@ commutative_binary_fn_p (internal_fn fn) > > > > case IFN_VEC_WIDEN_PLUS_HI: > > > > case IFN_VEC_WIDEN_PLUS_EVEN: > > > > case IFN_VEC_WIDEN_PLUS_ODD: > > > > + case IFN_VEC_ADD_HALFING_NARROW: > > > > + case IFN_VEC_ADD_HALFING_NARROW_LO: > > > > + case IFN_VEC_ADD_HALFING_NARROW_EVEN: > > > > return true; > > > > > > > > default: > > > > @@ -4462,6 +4465,8 @@ commutative_ternary_fn_p (internal_fn fn) > > > > case IFN_FNMA: > > > > case IFN_FNMS: > > > > case IFN_UADDC: > > > > + case IFN_VEC_ADD_HALFING_NARROW_HI: > > > > + case IFN_VEC_ADD_HALFING_NARROW_ODD: > > > > > > Huh, how can this be correct? Are they not binary? > > > > Correct they're ternary. > > > > > > > > > return true; > > > > > > > > default: > > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > > > index > > > > 69677dd10b980c83dec36487b1214ff066f4789b..152895f043b3ca60294b79c > > > 8301c6ff4014b955d 100644 > > > > --- a/gcc/internal-fn.def > > > > +++ b/gcc/internal-fn.def > > > > @@ -463,6 +463,12 @@ DEF_INTERNAL_WIDENING_OPTAB_FN > > > (VEC_WIDEN_ABD, > > > > first, > > > > vec_widen_sabd, vec_widen_uabd, > > > > binary) > > > > +DEF_INTERNAL_NARROWING_OPTAB_FN (VEC_ADD_HALFING_NARROW, > > > > + ECF_CONST | ECF_NOTHROW, > > > > + first, > > > > + vec_saddh_narrow, vec_uaddh_narrow, > > > > + binary, ternary) > > > > > > OK, I guess should have started to look at 1/n. Doing that now in > > > parallel. > > > > > > > + > > > > DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, > > > ternary) > > > > DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, > > > ternary) > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index > > > > 66e8a78744931c0137b83c5633c3a273fb69f003..d9d9046a8dcb7e5ca7cdf7c8 > > > 3e1945289950dc51 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -3181,6 +3181,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > || POINTER_TYPE_P (itype)) > > > > && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) > > > > > > > > +/* Detect (n)(((w)x + (w)y) >> bitsize(y)) where w is twice the > > > > bitsize of x and > > > > + y and n is half the bitsize of x and y. */ > > > > +(match (add_half_narrowing_p @0 @1) > > > > + (convert1? (rshift (plus:c (convert@3 @0) (convert @1)) > > > > INTEGER_CST@2)) > > > > > > why's the outer convert optional? The checks on n and w would make > > > a conversion required I think. Just use (convert (rshift (... here. > > > > Because match.pd wouldn't let me do it without the optional conversion. > > The test on the bitsize essentially mandates it's there anyway. > > I think using (convert (rshift (plus:c (convert@3 @0) (convert @1)) > INTEGER_CST@2)) will just work. Just using conver1 does not. >
I may have misunderstood this, but doesn't using the same convert here indicate they must be the same type? I thought the reason for convert, conver1 etc is to capture conversions from different types? > > > > > > > + (with { unsigned n = TYPE_PRECISION (type); > > > > + unsigned w = TYPE_PRECISION (TREE_TYPE (@3)); > > > > + unsigned x = TYPE_PRECISION (TREE_TYPE (@0)); } > > > > + (if (INTEGRAL_TYPE_P (type) > > > > + && n == x / 2 > > > > > > Now, because of weird types it would be safer to check n * 2 == x, > > > just in case of odd x ... > > > > > > Alternatively/additionally check && type_has_mode_precision_p (type) > > > > > > > + && w == x * 2 > > > > + && wi::eq_p (wi::to_wide (@2), x / 2))))) > > > > + > > > > /* Saturation add for unsigned integer. */ > > > > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)) > > > > (match (usadd_overflow_mask @0 @1) > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > > > index > > > > 87a8b85da1592646d0a3447572e842ceb158cd97..e226d85ddba7e43dd801fae > > > c61cac0372286314a 100644 > > > > --- a/gcc/optabs.def > > > > +++ b/gcc/optabs.def > > > > @@ -492,6 +492,16 @@ OPTAB_D (vec_widen_uabd_hi_optab, > > > "vec_widen_uabd_hi_$a") > > > > OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a") > > > > OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a") > > > > OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a") > > > > +OPTAB_D (vec_saddh_narrow_optab, "vec_saddh_narrow$a") > > > > +OPTAB_D (vec_saddh_narrow_hi_optab, "vec_saddh_narrow_hi_$a") > > > > +OPTAB_D (vec_saddh_narrow_lo_optab, "vec_saddh_narrow_lo_$a") > > > > +OPTAB_D (vec_saddh_narrow_odd_optab, "vec_saddh_narrow_odd_$a") > > > > +OPTAB_D (vec_saddh_narrow_even_optab, "vec_saddh_narrow_even_$a") > > > > +OPTAB_D (vec_uaddh_narrow_optab, "vec_uaddh_narrow$a") > > > > +OPTAB_D (vec_uaddh_narrow_hi_optab, "vec_uaddh_narrow_hi_$a") > > > > +OPTAB_D (vec_uaddh_narrow_lo_optab, "vec_uaddh_narrow_lo_$a") > > > > +OPTAB_D (vec_uaddh_narrow_odd_optab, "vec_uaddh_narrow_odd_$a") > > > > +OPTAB_D (vec_uaddh_narrow_even_optab, > "vec_uaddh_narrow_even_$a") > > > > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > > > > OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > > > > OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > > > index > > > > ffb320fbf2330522f25a9f4380f4744079a42306..b590c36fad23e44ec3fb954a4d > > > 2bb856ce3fc139 100644 > > > > --- a/gcc/tree-vect-patterns.cc > > > > +++ b/gcc/tree-vect-patterns.cc > > > > @@ -4768,6 +4768,64 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo, > > > stmt_vec_info stmt_vinfo, > > > > return NULL; > > > > } > > > > > > > > +extern bool gimple_add_half_narrowing_p (tree, tree*, tree (*)(tree)); > > > > + > > > > +/* > > > > + * Try to detect add halfing and narrowing pattern. > > > > + * > > > > + * _1 = (W)a > > > > + * _2 = (W)b > > > > + * _3 = _1 + _2 > > > > + * _4 = _3 >> (precision(a) / 2) > > > > + * _5 = (N)_4 > > > > + * > > > > + * where > > > > + * W = precision (a) * 2 > > > > + * N = precision (a) / 2 > > > > + */ > > > > + > > > > +static gimple * > > > > +vect_recog_add_halfing_narrow_pattern (vec_info *vinfo, > > > > + stmt_vec_info stmt_vinfo, > > > > + tree *type_out) > > > > +{ > > > > + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); > > > > + > > > > + if (!is_gimple_assign (last_stmt)) > > > > + return NULL; > > > > + > > > > + tree ops[2]; > > > > + tree lhs = gimple_assign_lhs (last_stmt); > > > > + > > > > + if (gimple_add_half_narrowing_p (lhs, ops, NULL)) > > > > + { > > > > + tree itype = TREE_TYPE (ops[0]); > > > > + tree otype = TREE_TYPE (lhs); > > > > + tree v_itype = get_vectype_for_scalar_type (vinfo, itype); > > > > + tree v_otype = get_vectype_for_scalar_type (vinfo, otype); > > > > + internal_fn ifn = IFN_VEC_ADD_HALFING_NARROW; > > > > + > > > > + if (v_itype != NULL_TREE && v_otype != NULL_TREE > > > > + && direct_internal_fn_supported_p (ifn, v_itype, > OPTIMIZE_FOR_BOTH)) > > > > > > why have the HI/LO and EVEN/ODD variants when you check for > > > IFN_VEC_ADD_HALFING_NARROW > > > only? > > > > > > > Because without HI/LO we will have to have quite a few arguments into the > actual > > Instruction. VEC_ADD_HALFING_NARROW does arithmetic as well, so the inputs > > are spread out over the operands. VEC_ADD_HALFING_NARROW would require > 4 > > inputs, where the first two and last two is used together. This would be > completely > > unclear from the use of the instruction itself. I could, but then it also > > means if you > > have a narrowing instruction which needs 3 inputs that the IFN needs 6. It > > did > not seem > > logical to do so. > > I am asking why you require support for a single out of the 5 IFNs during > pattern recog when, for example, the target might only support _hi/_lo. > > Yes, the pattern has to use the "scalar" VEC_ADD_HALFING_NARROW > (not in the packing way you implemented, but in the {V4SI,V4SI}->V4HI > way that's also "compatible" with scalar types). vectorizable_* will > then select the appropriate supported variant, also based on vector > types. Usually patterns call vect_supportable_narrowing_operation > (in case we have that, we do for widening), which then checks the > variants. > Ah yes, you're right this is a bug, it wasn't intended to require the VEC_ADD_HALFING_NARROW. If that's the concern I have misunderstood you and agree. Will fix once we settle on patch 1. Thanks for the review and patience, Tamar > > The alternative would have been to use just two inputs and use > VEC_PERM_EXPR to > > combine them. This would work for HI/LO, but then require backends to then > recognize > > the permute back into hi/lo operations, taking into account endianness. > > Possible > but seemed > > a roundabout way of doing it. > > > > Secondly it doesn't work for even/odd. VEC_PERM would fill in only a strided > value of the > > vector at a time. This becomes difficult for VLA and then you have to do > > tricks like > discount > > the costing of the permute if it's following an instruction you have > > even/odd > variant of. > > > > Concretely using VEC_ADD_HALFING_NARROW creates more issues than it > solves, but if > > you want that variant I will respin. > > > > Tamar > > > > > > + { > > > > + gcall *call = gimple_build_call_internal (ifn, 2, ops[0], > > > > ops[1]); > > > > + tree in_ssa = vect_recog_temp_ssa_var (otype, NULL); > > > > + > > > > + gimple_call_set_lhs (call, in_ssa); > > > > + gimple_call_set_nothrow (call, /* nothrow_p */ false); > > > > + gimple_set_location (call, > > > > + gimple_location (STMT_VINFO_STMT > > > > (stmt_vinfo))); > > > > + > > > > + *type_out = v_otype; > > > > + vect_pattern_detected > > > > ("vect_recog_add_halfing_narrow_pattern", > > > > + last_stmt); > > > > + return call; > > > > + } > > > > + } > > > > + > > > > + return NULL; > > > > +} > > > > + > > > > /* Detect a signed division by a constant that wouldn't be > > > > otherwise vectorized: > > > > > > > > @@ -6896,6 +6954,7 @@ static vect_recog_func > vect_vect_recog_func_ptrs[] = > > > { > > > > { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, > > > > { vect_recog_bit_insert_pattern, "bit_insert" }, > > > > { vect_recog_abd_pattern, "abd" }, > > > > + { vect_recog_add_halfing_narrow_pattern, "addhn" }, > > > > { vect_recog_over_widening_pattern, "over_widening" }, > > > > /* Must come after over_widening, which narrows the shift as much as > > > > possible beforehand. */ > > > > > > > > > > > > -- > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)