On Mon, 6 Oct 2025, Tamar Christina wrote:
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: 06 October 2025 17:13
> > To: Tamar Christina <[email protected]>
> > Cc: Richard Biener <[email protected]>; [email protected]; nd
> > <[email protected]>
> > Subject: Re: [PATCH 1/9]middle-end: refactor WIDEN_SUM_EXPR into convert
> > optab [PR122069]
> >
> >
> >
> > > Am 06.10.2025 um 18:03 schrieb Tamar Christina
> > <[email protected]>:
> > >
> > >
> > >>
> > >> -----Original Message-----
> > >> From: Richard Biener <[email protected]>
> > >> Sent: 06 October 2025 09:26
> > >> To: Tamar Christina <[email protected]>
> > >> Cc: [email protected]; nd <[email protected]>
> > >> Subject: Re: [PATCH 1/9]middle-end: refactor WIDEN_SUM_EXPR into
> > convert
> > >> optab [PR122069]
> > >>
> > >>> On Fri, 3 Oct 2025, Tamar Christina wrote:
> > >>>
> > >>> This patch changes the widen_[us]sum optabs into a convert optabs such
> > >> that
> > >>> targets and specify more than one conversion.
> > >>>
> > >>> Following this patch are patches rewriting all targets using this
> > >>> change.
> > >>>
> > >>> While working on this I noticed that the pattern does miss some cases it
> > >>> could handle if it tried multiple attempts. e.g. if the promotion is
> > >>> from
> > >>> qi to si, and the target doesn't have this, it should try hi -> si.
> > >>>
> > >>> But I'm leaving that for now.
> > >>>
> > >>> Bootstrapped Regtested on aarch64-none-linux-gnu,
> > >>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > >>> -m32, -m64 and no issues
> > >>>
> > >>> Ok for master?
> > >>
> > >> OK.
> > >>
> > >> I'll note we might want to document that this, the dot_prod and
> > >> the sad patterns are working on integer vector modes only and
> > >> copy the part of the docs from dot_prod that specifies which
> > >> of the vector output lanes the accumulation happens on (in case
> > >> this is now fully consistent on all targets).
> > >
> > > Can do, but the patterns technically don't have this limitation today
> > > though. i.e. they're not restricted to integer vector modes.
> > >
> > > You may be asking yourself "but floating point doesn't make sense here",
> > > but it does in the context of BF16 and FP8.
> > >
> > > Arithmetic in BF16 often times results in FP32 values, so e.g.
> > WIDEN_SUM_EXPR
> > > For BF16 -> FP32 can be implemented with
> > https://developer.arm.com/documentation/ddi0596/2020-12/SVE-
> > Instructions/BFDOT--vectors---BFloat16-floating-point-dot-product-
> > >
> > > And similarly for FP8
> > https://developer.arm.com/documentation/ddi0602/2023-09/SIMD-FP-
> > Instructions/FDOT--4-way--vector---8-bit-floating-point-dot-product-to-
> > single-precision--vector--
> > >
> > > We just don't support soft float emulation for these today and such you
> > > get
> > to the
> > > vectorizer with the values already promoted to FP32. But we could in the
> > future.
> > >
> > > Given that the patterns don't actually have this restriction today, do
> > > you still
> > want
> > > the docs update?
> >
> > I see the optabs are marked with $I though. Not sure what that means in
> > this
> > context. IMO they all should be consistent in this regard?
>
> Ah, I see, yeah the optabs def have a stricter requirement than the
> documentation
> or the vector patterns have. I hadn't noticed that.
>
> I'll double check the rest, but I propose dropping the $I then. It looks
> like today the
> vectorizer would just fail in verify_gimple.
Sure, it makes sense to support BF16 and FP8 if instructions with the
same semantics exist there.
Richard.
> Thanks,
> Tamar
>
> >
> > Richard
> >
> > > Thanks,
> > > Tamar
> > >
> > >>
> > >> I do wonder whether it makes sense to differentiate between vector
> > >> and non-vector modes in optabs.def and gen*, but that's a much
> > >> larger task.
> > >>
> > >> Thanks,
> > >> Richard.
> > >>
> > >>> Thanks,
> > >>> Tamar
> > >>>
> > >>> gcc/ChangeLog:
> > >>>
> > >>> PR middle-end/122069
> > >>> * doc/md.texi (widen_ssum@var{n}@var{m}3,
> > >> widen_usum@var{n}@var{m}3):
> > >>> Update docs.
> > >>> * optabs.cc (expand_widen_pattern_expr): Add WIDEN_SUM_EXPR as
> > >> widening.
> > >>> * optabs.def (ssum_widen_optab, usum_widen_optab): Convert
> > >> from direct
> > >>> to a conversion optab.
> > >>> * tree-vect-patterns.cc (vect_recog_widen_sum_pattern): Change
> > >>> vect_supportable_direct_optab_p into
> > >> vect_supportable_conv_optab_p.
> > >>>
> > >>> gcc/testsuite/ChangeLog:
> > >>>
> > >>> PR middle-end/122069
> > >>> * gcc.dg/vect/slp-reduc-3.c: vect_widen_sum_hi_to_si_pattern
> > >> targets now
> > >>> pass.
> > >>>
> > >>> ---
> > >>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > >>> index
> > >>
> > 44e1149bea89b18903061713e8319d834b76adbf..97d21b90a650e5e5fad
> > >> 5cd72b01f30983ca4ab43 100644
> > >>> --- a/gcc/doc/md.texi
> > >>> +++ b/gcc/doc/md.texi
> > >>> @@ -5847,15 +5847,15 @@ equal or wider than the mode of the
> > absolute
> > >> difference. The result is placed
> > >>> in operand 0, which is of the same mode as operand 3.
> > >>> @var{m} is the mode of operand 1 and operand 2.
> > >>>
> > >>> -@cindex @code{widen_ssum@var{m}3} instruction pattern
> > >>> -@cindex @code{widen_usum@var{m}3} instruction pattern
> > >>> -@item @samp{widen_ssum@var{m}3}
> > >>> -@itemx @samp{widen_usum@var{m}3}
> > >>> +@cindex @code{widen_ssum@var{n}@var{m}3} instruction pattern
> > >>> +@cindex @code{widen_usum@var{n}@var{m}3} instruction pattern
> > >>> +@item @samp{widen_ssum@var{n}@var{m}3}
> > >>> +@itemx @samp{widen_usum@var{n}@var{m}3}
> > >>> Operands 0 and 2 are of the same mode, which is wider than the mode of
> > >>> operand 1. Add operand 1 to operand 2 and place the widened result in
> > >>> operand 0. (This is used express accumulation of elements into an
> > >> accumulator
> > >>> of a wider mode.)
> > >>> -@var{m} is the mode of operand 1.
> > >>> +@var{m} is the mode of operand 1 and @var{n} is the mode of operand
> > 0.
> > >>>
> > >>> @cindex @code{smulhs@var{m}3} instruction pattern
> > >>> @cindex @code{umulhs@var{m}3} instruction pattern
> > >>> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > >>> index
> > >>
> > 5c9450f61450fa4425d08339a1c2b5f7f5e654ec..0865fc2e19aeb2b3056c86
> > >> 34334d6c1644a3cc96 100644
> > >>> --- a/gcc/optabs.cc
> > >>> +++ b/gcc/optabs.cc
> > >>> @@ -322,6 +322,10 @@ expand_widen_pattern_expr (const_sepops
> > ops,
> > >> rtx op0, rtx op1, rtx wide_op,
> > >>> icode = find_widening_optab_handler (widen_pattern_optab,
> > >>> TYPE_MODE (TREE_TYPE (ops-
> > >>> op2)),
> > >>> tmode0);
> > >>> + else if (ops->code == WIDEN_SUM_EXPR)
> > >>> + icode = find_widening_optab_handler (widen_pattern_optab,
> > >>> + TYPE_MODE (TREE_TYPE (ops-
> > >>> op1)),
> > >>> + tmode0);
> > >>> else
> > >>> icode = optab_handler (widen_pattern_optab, tmode0);
> > >>> gcc_assert (icode != CODE_FOR_nothing);
> > >>> diff --git a/gcc/optabs.def b/gcc/optabs.def
> > >>> index
> > >>
> > 790e43f08f476c8025dc2797f9ecaffe5b66acc5..e2ffb2b6423893b5dd757af
> > >> 1ed3f342ce8c9f76a 100644
> > >>> --- a/gcc/optabs.def
> > >>> +++ b/gcc/optabs.def
> > >>> @@ -85,6 +85,8 @@ OPTAB_CD(smsub_widen_optab, "msub$b$a4")
> > >>> OPTAB_CD(umsub_widen_optab, "umsub$b$a4")
> > >>> OPTAB_CD(ssmsub_widen_optab, "ssmsub$b$a4")
> > >>> OPTAB_CD(usmsub_widen_optab, "usmsub$a$b4")
> > >>> +OPTAB_CD(ssum_widen_optab, "widen_ssum$I$a$b3")
> > >>> +OPTAB_CD(usum_widen_optab, "widen_usum$I$a$b3")
> > >>> OPTAB_CD(crc_optab, "crc$a$b4")
> > >>> OPTAB_CD(crc_rev_optab, "crc_rev$a$b4")
> > >>> OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
> > >>> @@ -415,8 +417,6 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
> > >>> OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
> > >>> OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
> > >>> OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > >>> -OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> > >>> -OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > >>> OPTAB_D (usad_optab, "usad$I$a")
> > >>> OPTAB_D (ssad_optab, "ssad$I$a")
> > >>> OPTAB_D (smulhs_optab, "smulhs$a3")
> > >>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> > >> b/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> > >>> index
> > >>
> > 614d8ad17ca1629af9f43cedec3cbed197d9a582..b8aff98990b202eae2a7c3
> > >> 67457113aa1b811eda 100644
> > >>> --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> > >>> +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> > >>> @@ -60,6 +60,6 @@ int main (void)
> > >>> /* The initialization loop in main also gets vectorized. */
> > >>> /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> > >> detected" 1 "vect" { xfail *-*-* } } } */
> > >>> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {
> > >>> target {
> > >> vect_short_mult && { vect_widen_sum_hi_to_si && vect_unpack } } } } } */
> > >>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
> > "vect" {
> > >> xfail { vect_widen_sum_hi_to_si_pattern || { ! { vect_short_mult && {
> > >> vect_widen_sum_hi_to_si && vect_unpack } } } } } } } */
> > >>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
> > "vect" {
> > >> xfail { ! { vect_short_mult && { vect_widen_sum_hi_to_si && vect_unpack
> > >> }
> > } }
> > >> } } } */
> > >>> /* Check we can elide permutes if SLP vectorizing the reduction. */
> > >>> /* { dg-final { scan-tree-dump-times " = VEC_PERM_EXPR" 0 "vect" {
> > >>> xfail {
> > {
> > >> { vect_widen_sum_hi_to_si_pattern || { ! vect_unpack } } && { !
> > >> vect_load_lanes } } && { vect_short_mult && { vect_widen_sum_hi_to_si
> > &&
> > >> vect_unpack } } } } } } */
> > >>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > >>> index
> > >>
> > 782327235db16384c2d71186911802daf7a15ebc..38695647f602792909c
> > >> 486ae52a3fbf8cc28b39e 100644
> > >>> --- a/gcc/tree-vect-patterns.cc
> > >>> +++ b/gcc/tree-vect-patterns.cc
> > >>> @@ -2544,8 +2544,8 @@ vect_recog_widen_sum_pattern (vec_info
> > *vinfo,
> > >>>
> > >>> vect_pattern_detected ("vect_recog_widen_sum_pattern", last_stmt);
> > >>>
> > >>> - if (!vect_supportable_direct_optab_p (vinfo, type, WIDEN_SUM_EXPR,
> > >>> - unprom0.type, type_out))
> > >>> + if (!vect_supportable_conv_optab_p (vinfo, type, WIDEN_SUM_EXPR,
> > >>> + unprom0.type, type_out))
> > >>> return NULL;
> > >>>
> > >>> var = vect_recog_temp_ssa_var (type, NULL);
> > >>>
> > >>>
> > >>>
> > >>
> > >> --
> > >> Richard Biener <[email protected]>
> > >> SUSE Software Solutions Germany GmbH,
> > >> Frankenstrasse 146, 90461 Nuernberg, Germany;
> > >> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > >> Nuernberg)
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)