RE: [PATCH 2/5]middle-end: Add detection for add halfing and narrowing instruction

Tamar Christina Thu, 21 Aug 2025 04:30:23 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Thursday, August 21, 2025 11:55 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org; nd
> <n...@arm.com>
> Subject: RE: [PATCH 2/5]middle-end: Add detection for add halfing and 
> narrowing
> instruction
> 
> On Wed, 20 Aug 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <richard.guent...@gmail.com>
> > > Sent: Wednesday, August 20, 2025 1:48 PM
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de
> > > Subject: Re: [PATCH 2/5]middle-end: Add detection for add halfing and
> narrowing
> > > instruction
> > >
> > > On Tue, Aug 19, 2025 at 6:29 AM Tamar Christina <tamar.christ...@arm.com>
> > > wrote:
> > > >
> > > > This adds support for detectioon of the ADDHN pattern in the vectorizer.
> > > >
> > > > Concretely try to detect
> > > >
> > > >  _1 = (W)a
> > > >  _2 = (W)b
> > > >  _3 = _1 + _2
> > > >  _4 = _3 >> (precision(a) / 2)
> > > >  _5 = (N)_4
> > > >
> > > >  where
> > > >    W = precision (a) * 2
> > > >    N = precision (a) / 2
> > >
> > > Hmm.  Is the widening because of UB with signed overflow?  The
> > > actual carry of a + b doesn't end up in (N)(_3 >> (precision(a) / 2)).
> > > I'd expect that for unsigned a and b you could see just
> > > (N)((a + b) >> (precision(a) / 2)), no?  Integer promotion would make
> > > this difficult to write, of course, unless the patterns exist for SImode
> > > -> HImode add-high.
> > >
> >
> > I guess the description is inaccurate, addhn extract explicitly the high
> > bits of the results. So the high bits will end up in the low part.
> >
> > > Also ...
> > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues.
> > > >
> > > > Ok for master? Tests in the next patch which adds the optabs to AArch64.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * internal-fn.def (VEC_ADD_HALFING_NARROW,
> > > >         IFN_VEC_ADD_HALFING_NARROW_LO,
> > > IFN_VEC_ADD_HALFING_NARROW_HI,
> > > >         IFN_VEC_ADD_HALFING_NARROW_EVEN,
> > > IFN_VEC_ADD_HALFING_NARROW_ODD): New.
> > > >         * internal-fn.cc (commutative_binary_fn_p): Add
> > > >         IFN_VEC_ADD_HALFING_NARROW,
> IFN_VEC_ADD_HALFING_NARROW_LO
> > > and
> > > >         IFN_VEC_ADD_HALFING_NARROW_EVEN.
> > > >         (commutative_ternary_fn_p): Add
> IFN_VEC_ADD_HALFING_NARROW_HI,
> > > >         IFN_VEC_ADD_HALFING_NARROW_ODD.
> > > >         * match.pd (add_half_narrowing_p): New.
> > > >         * optabs.def (vec_saddh_narrow_optab, vec_saddh_narrow_hi_optab,
> > > >         vec_saddh_narrow_lo_optab, vec_saddh_narrow_odd_optab,
> > > >         vec_saddh_narrow_even_optab, vec_uaddh_narrow_optab,
> > > >         vec_uaddh_narrow_hi_optab, vec_uaddh_narrow_lo_optab,
> > > >         vec_uaddh_narrow_odd_optab, vec_uaddh_narrow_even_optab): New.
> > > >         * tree-vect-patterns.cc (gimple_add_half_narrowing_p): New.
> > > >         (vect_recog_add_halfing_narrow_pattern): New.
> > > >         (vect_vect_recog_func_ptrs): Use it.
> > > >         * doc/generic.texi: Document them.
> > > >         * doc/md.texi: Likewise.
> > > >
> > > > ---
> > > > diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
> > > > index
> > >
> d4ac580a7a8b9cd339d26cb97f7eb963f83746a4..b32d99d4d1aad244a493d8f
> > > 67b66151ff5363d0e 100644
> > > > --- a/gcc/doc/generic.texi
> > > > +++ b/gcc/doc/generic.texi
> > > > @@ -1834,6 +1834,11 @@ a value from @code{enum annot_expr_kind},
> the
> > > third is an @code{INTEGER_CST}.
> > > >  @tindex IFN_VEC_WIDEN_MINUS_LO
> > > >  @tindex IFN_VEC_WIDEN_MINUS_EVEN
> > > >  @tindex IFN_VEC_WIDEN_MINUS_ODD
> > > > +@tindex IFN_VEC_ADD_HALFING_NARROW
> > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_HI
> > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_LO
> > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_EVEN
> > > > +@tindex IFN_VEC_ADD_HALFING_NARROW_ODD
> > > >  @tindex VEC_UNPACK_HI_EXPR
> > > >  @tindex VEC_UNPACK_LO_EXPR
> > > >  @tindex VEC_UNPACK_FLOAT_HI_EXPR
> > > > @@ -1956,6 +1961,51 @@ vector of @code{N/2} subtractions.  In the case
> of
> > > >  vector are subtracted from the odd @code{N/2} of the first to produce 
> > > > the
> > > >  vector of @code{N/2} subtractions.
> > > >
> > > > +@item IFN_VEC_ADD_HALFING_NARROW
> > > > +This internal function represents widening vector addition of two input
> > > > +vectors, extracting the top half of the result and narrow that value 
> > > > to a type
> > > > +half that of the original input.
> > > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}.  Its
> operands
> > > > +are vectors that contain the same number of elements (@code{N}) of the
> same
> > > > +integral type.  The result is a vector that contains the same amount
> (@code{N})
> > > > +of elements, of an integral type whose size is twice as narrow, as the 
> > > > input
> > > > +vectors.  If the current target does not implement the corresponding 
> > > > optabs
> the
> > > > +vectorizer may choose to split it into either a pair
> > > > +of @code{IFN_VEC_ADD_HALFING_NARROW_HI} and
> > > @code{IFN_VEC_ADD_HALFING_NARROW_LO}
> > > > +or @code{IFN_VEC_ADD_HALFING_NARROW_EVEN} and
> > > > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD}, depending on what
> optabs
> > > the target
> > > > +implements.
> > > > +
> > > > +@item IFN_VEC_ADD_HALFING_NARROW_HI
> > > > +@itemx IFN_VEC_ADD_HALFING_NARROW_LO
> > > > +This internal function represents widening vector addition of two input
> > > > +vectors, extracting the top half of the result and narrow that value 
> > > > to a type
> > > > +half that of the original input inserting the result as the high or 
> > > > low half of
> > > > +the result vector.
> > > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}.  
> > > > Their
> > > > +operands are vectors that contain the same number of elements
> (@code{N}) of
> > > the
> > > > +same integral type. The result is a vector that contains half as many
> elements,
> > > > +of an integral type whose size is twice as narrow.  In the case of
> > > > +@code{IFN_VEC_ADD_HALFING_NARROW_HI} the high @code{N/2}
> elements
> > > of the result
> > > > +is inserted into the given result vector with the low elements left 
> > > > untouched.
> > > > +The operation is a RMW.  In the case of
> > > @code{IFN_VEC_ADD_HALFING_NARROW_LO} the
> > > > +low @code{N/2} elements of the result is used as the full result.
> > > > +
> > > > +@item IFN_VEC_ADD_HALFING_NARROW_EVEN
> > > > +@itemx IFN_VEC_ADD_HALFING_NARROW_ODD
> > > > +This internal function represents widening vector addition of two input
> > > > +vectors, extracting the top half of the result and narrow that value 
> > > > to a type
> > > > +half that of the original input inserting the result as the even or 
> > > > odd parts of
> > > > +the result vector.
> > > > +Congretely it does @code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}.  
> > > > Their
> > > > +operands are vectors that contain the same number of elements
> (@code{N}) of
> > > the
> > > > +same integral type. The result is a vector that contains half as many
> elements,
> > > > +of an integral type whose size is twice as narrow.  In the case of
> > > > +@code{IFN_VEC_ADD_HALFING_NARROW_ODD} the odd @code{N/2}
> > > elements of the result
> > > > +is inserted into the given result vector with the even elements left
> untouched.
> > > > +The operation is a RMW.  In the case of
> > > @code{IFN_VEC_ADD_HALFING_NARROW_EVEN}
> > > > +the even @code{N/2} elements of the result is used as the full result.
> > > > +
> > > >  @item VEC_UNPACK_HI_EXPR
> > > >  @itemx VEC_UNPACK_LO_EXPR
> > > >  These nodes represent unpacking of the high and low parts of the input
> vector,
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > > index
> > >
> aba93f606eca59d31c103a05b2567fd4f3be55f3..cb691b56f137a0037f5178ba8
> > > 53911df5a65e5a7 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -6087,6 +6087,21 @@ vectors with N signed/unsigned elements of size
> > > S@.  Find the absolute
> > > >  difference between operands 1 and 2 and widen the resulting elements.
> > > >  Put the N/2 results of size 2*S in the output vector (operand 0).
> > > >
> > > > +@cindex @code{vec_saddh_narrow_hi_@var{m}} instruction pattern
> > > > +@cindex @code{vec_saddh_narrow_lo_@var{m}} instruction pattern
> > > > +@cindex @code{vec_uaddh_narrow_hi_@var{m}} instruction pattern
> > > > +@cindex @code{vec_uaddh_narrow_lo_@var{m}} instruction pattern
> > > > +@item @samp{vec_uaddh_narrow_hi_@var{m}},
> > > @samp{vec_uaddh_narrow_lo_@var{m}}
> > > > +@itemx @samp{vec_saddh_narrow_hi_@var{m}},
> > > @samp{vec_saddh_narrow_lo_@var{m}}
> > > > +@item @samp{vec_uaddh_narrow_even_@var{m}},
> > > @samp{vec_uaddh_narrow_even_@var{m}}
> > > > +@itemx @samp{vec_saddh_narrow_odd_@var{m}},
> > > @samp{vec_saddh_narrow_odd_@var{m}}
> > > > +Signed/Unsigned widening add long extract high half and narrow.  
> > > > Operands
> 1
> > > and
> > > > +2 are vectors with N signed/unsigned elements of size S@.  Add the
> high/low
> > > > +elements of 1 and 2 together in a widened precision, extract the top 
> > > > half and
> > > > +narrow the result to half the size of S@ abd store the results in the 
> > > > output
> > > > +vector (operand 0).  Congretely it does
> > > > +@code{((|bits(a)/2|)((a w+ b) >> |bits(a)/2|)}
> > > > +
> > > >  @cindex @code{vec_addsub@var{m}3} instruction pattern
> > > >  @item @samp{vec_addsub@var{m}3}
> > > >  Alternating subtract, add with even lanes doing subtract and odd
> > > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > > index
> > >
> 83438dd2ff57474cec999adaeabe92c0540e2a51..e600dbc4b3a0b27f78be00d5
> > > 2f7f6a54a13d7241 100644
> > > > --- a/gcc/internal-fn.cc
> > > > +++ b/gcc/internal-fn.cc
> > > > @@ -4442,6 +4442,9 @@ commutative_binary_fn_p (internal_fn fn)
> > > >      case IFN_VEC_WIDEN_PLUS_HI:
> > > >      case IFN_VEC_WIDEN_PLUS_EVEN:
> > > >      case IFN_VEC_WIDEN_PLUS_ODD:
> > > > +    case IFN_VEC_ADD_HALFING_NARROW:
> > > > +    case IFN_VEC_ADD_HALFING_NARROW_LO:
> > > > +    case IFN_VEC_ADD_HALFING_NARROW_EVEN:
> > > >        return true;
> > > >
> > > >      default:
> > > > @@ -4462,6 +4465,8 @@ commutative_ternary_fn_p (internal_fn fn)
> > > >      case IFN_FNMA:
> > > >      case IFN_FNMS:
> > > >      case IFN_UADDC:
> > > > +    case IFN_VEC_ADD_HALFING_NARROW_HI:
> > > > +    case IFN_VEC_ADD_HALFING_NARROW_ODD:
> > >
> > > Huh, how can this be correct?  Are they not binary?
> >
> > Correct they're ternary.
> >
> > >
> > > >        return true;
> > > >
> > > >      default:
> > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > > index
> > >
> 69677dd10b980c83dec36487b1214ff066f4789b..152895f043b3ca60294b79c
> > > 8301c6ff4014b955d 100644
> > > > --- a/gcc/internal-fn.def
> > > > +++ b/gcc/internal-fn.def
> > > > @@ -463,6 +463,12 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> > > (VEC_WIDEN_ABD,
> > > >                                 first,
> > > >                                 vec_widen_sabd, vec_widen_uabd,
> > > >                                 binary)
> > > > +DEF_INTERNAL_NARROWING_OPTAB_FN (VEC_ADD_HALFING_NARROW,
> > > > +                               ECF_CONST | ECF_NOTHROW,
> > > > +                               first,
> > > > +                               vec_saddh_narrow, vec_uaddh_narrow,
> > > > +                               binary, ternary)
> > >
> > > OK, I guess should have started to look at 1/n.  Doing that now in 
> > > parallel.
> > >
> > > > +
> > > >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> > > ternary)
> > > >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> > > ternary)
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index
> > >
> 66e8a78744931c0137b83c5633c3a273fb69f003..d9d9046a8dcb7e5ca7cdf7c8
> > > 3e1945289950dc51 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3181,6 +3181,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >         || POINTER_TYPE_P (itype))
> > > >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> > > >
> > > > +/* Detect (n)(((w)x + (w)y) >> bitsize(y)) where w is twice the 
> > > > bitsize of x and
> > > > +    y and n is half the bitsize of x and y.  */
> > > > +(match (add_half_narrowing_p @0 @1)
> > > > + (convert1? (rshift (plus:c (convert@3 @0) (convert @1)) 
> > > > INTEGER_CST@2))
> > >
> > > why's the outer convert optional?  The checks on n and w would make
> > > a conversion required I think.  Just use (convert (rshift (... here.
> >
> > Because match.pd wouldn't let me do it without the optional conversion.
> > The test on the bitsize essentially mandates it's there anyway.
> 
> I think using (convert (rshift (plus:c (convert@3 @0) (convert @1))
> INTEGER_CST@2)) will just work.  Just using conver1 does not.
>


I may have misunderstood this, but doesn't using the same convert here
indicate they must be the same type? I thought the reason for convert, conver1 
etc
is to capture conversions from different types?

> > >
> > > > + (with { unsigned n = TYPE_PRECISION (type);
> > > > +        unsigned w = TYPE_PRECISION (TREE_TYPE (@3));
> > > > +        unsigned x = TYPE_PRECISION (TREE_TYPE (@0)); }
> > > > +  (if (INTEGRAL_TYPE_P (type)
> > > > +       && n == x / 2
> > >
> > > Now, because of weird types it would be safer to check n * 2 == x,
> > > just in case of odd x ...
> > >
> > > Alternatively/additionally check && type_has_mode_precision_p (type)
> > >
> > > > +       && w == x * 2
> > > > +       && wi::eq_p (wi::to_wide (@2), x / 2)))))
> > > > +
> > > >  /* Saturation add for unsigned integer.  */
> > > >  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
> > > >   (match (usadd_overflow_mask @0 @1)
> > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > index
> > >
> 87a8b85da1592646d0a3447572e842ceb158cd97..e226d85ddba7e43dd801fae
> > > c61cac0372286314a 100644
> > > > --- a/gcc/optabs.def
> > > > +++ b/gcc/optabs.def
> > > > @@ -492,6 +492,16 @@ OPTAB_D (vec_widen_uabd_hi_optab,
> > > "vec_widen_uabd_hi_$a")
> > > >  OPTAB_D (vec_widen_uabd_lo_optab, "vec_widen_uabd_lo_$a")
> > > >  OPTAB_D (vec_widen_uabd_odd_optab, "vec_widen_uabd_odd_$a")
> > > >  OPTAB_D (vec_widen_uabd_even_optab, "vec_widen_uabd_even_$a")
> > > > +OPTAB_D (vec_saddh_narrow_optab, "vec_saddh_narrow$a")
> > > > +OPTAB_D (vec_saddh_narrow_hi_optab, "vec_saddh_narrow_hi_$a")
> > > > +OPTAB_D (vec_saddh_narrow_lo_optab, "vec_saddh_narrow_lo_$a")
> > > > +OPTAB_D (vec_saddh_narrow_odd_optab, "vec_saddh_narrow_odd_$a")
> > > > +OPTAB_D (vec_saddh_narrow_even_optab, "vec_saddh_narrow_even_$a")
> > > > +OPTAB_D (vec_uaddh_narrow_optab, "vec_uaddh_narrow$a")
> > > > +OPTAB_D (vec_uaddh_narrow_hi_optab, "vec_uaddh_narrow_hi_$a")
> > > > +OPTAB_D (vec_uaddh_narrow_lo_optab, "vec_uaddh_narrow_lo_$a")
> > > > +OPTAB_D (vec_uaddh_narrow_odd_optab, "vec_uaddh_narrow_odd_$a")
> > > > +OPTAB_D (vec_uaddh_narrow_even_optab,
> "vec_uaddh_narrow_even_$a")
> > > >  OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> > > >  OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> > > >  OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
> > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > index
> > >
> ffb320fbf2330522f25a9f4380f4744079a42306..b590c36fad23e44ec3fb954a4d
> > > 2bb856ce3fc139 100644
> > > > --- a/gcc/tree-vect-patterns.cc
> > > > +++ b/gcc/tree-vect-patterns.cc
> > > > @@ -4768,6 +4768,64 @@ vect_recog_sat_trunc_pattern (vec_info *vinfo,
> > > stmt_vec_info stmt_vinfo,
> > > >    return NULL;
> > > >  }
> > > >
> > > > +extern bool gimple_add_half_narrowing_p (tree, tree*, tree (*)(tree));
> > > > +
> > > > +/*
> > > > + * Try to detect add halfing and narrowing pattern.
> > > > + *
> > > > + * _1 = (W)a
> > > > + * _2 = (W)b
> > > > + * _3 = _1 + _2
> > > > + * _4 = _3 >> (precision(a) / 2)
> > > > + * _5 = (N)_4
> > > > + *
> > > > + * where
> > > > + *   W = precision (a) * 2
> > > > + *   N = precision (a) / 2
> > > > + */
> > > > +
> > > > +static gimple *
> > > > +vect_recog_add_halfing_narrow_pattern (vec_info *vinfo,
> > > > +                                      stmt_vec_info stmt_vinfo,
> > > > +                                      tree *type_out)
> > > > +{
> > > > +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> > > > +
> > > > +  if (!is_gimple_assign (last_stmt))
> > > > +    return NULL;
> > > > +
> > > > +  tree ops[2];
> > > > +  tree lhs = gimple_assign_lhs (last_stmt);
> > > > +
> > > > +  if (gimple_add_half_narrowing_p (lhs, ops, NULL))
> > > > +    {
> > > > +      tree itype = TREE_TYPE (ops[0]);
> > > > +      tree otype = TREE_TYPE (lhs);
> > > > +      tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> > > > +      tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> > > > +      internal_fn ifn = IFN_VEC_ADD_HALFING_NARROW;
> > > > +
> > > > +      if (v_itype != NULL_TREE && v_otype != NULL_TREE
> > > > +         && direct_internal_fn_supported_p (ifn, v_itype,
> OPTIMIZE_FOR_BOTH))
> > >
> > > why have the HI/LO and EVEN/ODD variants when you check for
> > > IFN_VEC_ADD_HALFING_NARROW
> > > only?
> > >
> >
> > Because without HI/LO we will have to have quite a few arguments into the
> actual
> > Instruction.  VEC_ADD_HALFING_NARROW does arithmetic as well, so the inputs
> > are spread out over the operands. VEC_ADD_HALFING_NARROW would require
> 4
> > inputs, where the first two and last two is used together.  This would be
> completely
> > unclear from the use of the instruction itself. I could, but then it also 
> > means if you
> > have a narrowing instruction which needs 3 inputs that the IFN needs 6. It 
> > did
> not seem
> > logical to do so.
> 
> I am asking why you require support for a single out of the 5 IFNs during
> pattern recog when, for example, the target might only support _hi/_lo.
> 
> Yes, the pattern has to use the "scalar" VEC_ADD_HALFING_NARROW
> (not in the packing way you implemented, but in the {V4SI,V4SI}->V4HI
> way that's also "compatible" with scalar types).  vectorizable_* will
> then select the appropriate supported variant, also based on vector
> types.  Usually patterns call vect_supportable_narrowing_operation
> (in case we have that, we do for widening), which then checks the
> variants.
> 

Ah yes, you're right this is a bug, it wasn't intended to require the 
VEC_ADD_HALFING_NARROW.

If that's the concern I have misunderstood you and agree.

Will fix once we settle on patch 1.

Thanks for the review and patience,
Tamar

> > The alternative would have been to use just two inputs and use
> VEC_PERM_EXPR to
> > combine them.   This would work for HI/LO, but then require backends to then
> recognize
> > the permute back into hi/lo operations, taking into account endianness.  
> > Possible
> but seemed
> > a roundabout way of doing it.
> >
> > Secondly it doesn't work for even/odd. VEC_PERM would fill in only a strided
> value of the
> > vector at a time.  This becomes difficult for VLA and then you have to do 
> > tricks like
> discount
> > the costing of the permute if it's following an instruction you have 
> > even/odd
> variant of.
> >
> > Concretely using VEC_ADD_HALFING_NARROW creates more issues than it
> solves, but if
> > you want that variant I will respin.
> >
> > Tamar
> >
> > > > +       {
> > > > +         gcall *call = gimple_build_call_internal (ifn, 2, ops[0], 
> > > > ops[1]);
> > > > +         tree in_ssa = vect_recog_temp_ssa_var (otype, NULL);
> > > > +
> > > > +         gimple_call_set_lhs (call, in_ssa);
> > > > +         gimple_call_set_nothrow (call, /* nothrow_p */ false);
> > > > +         gimple_set_location (call,
> > > > +                              gimple_location (STMT_VINFO_STMT 
> > > > (stmt_vinfo)));
> > > > +
> > > > +         *type_out = v_otype;
> > > > +         vect_pattern_detected 
> > > > ("vect_recog_add_halfing_narrow_pattern",
> > > > +                                last_stmt);
> > > > +         return call;
> > > > +       }
> > > > +    }
> > > > +
> > > > +  return NULL;
> > > > +}
> > > > +
> > > >  /* Detect a signed division by a constant that wouldn't be
> > > >     otherwise vectorized:
> > > >
> > > > @@ -6896,6 +6954,7 @@ static vect_recog_func
> vect_vect_recog_func_ptrs[] =
> > > {
> > > >    { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
> > > >    { vect_recog_bit_insert_pattern, "bit_insert" },
> > > >    { vect_recog_abd_pattern, "abd" },
> > > > +  { vect_recog_add_halfing_narrow_pattern, "addhn" },
> > > >    { vect_recog_over_widening_pattern, "over_widening" },
> > > >    /* Must come after over_widening, which narrows the shift as much as
> > > >       possible beforehand.  */
> > > >
> > > >
> > > > --
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 2/5]middle-end: Add detection for add halfing and narrowing instruction

Reply via email to