> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 06 October 2025 17:13
> To: Tamar Christina <[email protected]>
> Cc: Richard Biener <[email protected]>; [email protected]; nd
> <[email protected]>
> Subject: Re: [PATCH 1/9]middle-end: refactor WIDEN_SUM_EXPR into convert
> optab [PR122069]
>
>
>
> > Am 06.10.2025 um 18:03 schrieb Tamar Christina
> <[email protected]>:
> >
> >
> >>
> >> -----Original Message-----
> >> From: Richard Biener <[email protected]>
> >> Sent: 06 October 2025 09:26
> >> To: Tamar Christina <[email protected]>
> >> Cc: [email protected]; nd <[email protected]>
> >> Subject: Re: [PATCH 1/9]middle-end: refactor WIDEN_SUM_EXPR into
> convert
> >> optab [PR122069]
> >>
> >>> On Fri, 3 Oct 2025, Tamar Christina wrote:
> >>>
> >>> This patch changes the widen_[us]sum optabs into a convert optabs such
> >> that
> >>> targets and specify more than one conversion.
> >>>
> >>> Following this patch are patches rewriting all targets using this change.
> >>>
> >>> While working on this I noticed that the pattern does miss some cases it
> >>> could handle if it tried multiple attempts. e.g. if the promotion is from
> >>> qi to si, and the target doesn't have this, it should try hi -> si.
> >>>
> >>> But I'm leaving that for now.
> >>>
> >>> Bootstrapped Regtested on aarch64-none-linux-gnu,
> >>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> >>> -m32, -m64 and no issues
> >>>
> >>> Ok for master?
> >>
> >> OK.
> >>
> >> I'll note we might want to document that this, the dot_prod and
> >> the sad patterns are working on integer vector modes only and
> >> copy the part of the docs from dot_prod that specifies which
> >> of the vector output lanes the accumulation happens on (in case
> >> this is now fully consistent on all targets).
> >
> > Can do, but the patterns technically don't have this limitation today
> > though. i.e. they're not restricted to integer vector modes.
> >
> > You may be asking yourself "but floating point doesn't make sense here",
> > but it does in the context of BF16 and FP8.
> >
> > Arithmetic in BF16 often times results in FP32 values, so e.g.
> WIDEN_SUM_EXPR
> > For BF16 -> FP32 can be implemented with
> https://developer.arm.com/documentation/ddi0596/2020-12/SVE-
> Instructions/BFDOT--vectors---BFloat16-floating-point-dot-product-
> >
> > And similarly for FP8
> https://developer.arm.com/documentation/ddi0602/2023-09/SIMD-FP-
> Instructions/FDOT--4-way--vector---8-bit-floating-point-dot-product-to-
> single-precision--vector--
> >
> > We just don't support soft float emulation for these today and such you get
> to the
> > vectorizer with the values already promoted to FP32. But we could in the
> future.
> >
> > Given that the patterns don't actually have this restriction today, do you
> > still
> want
> > the docs update?
>
> I see the optabs are marked with $I though. Not sure what that means in this
> context. IMO they all should be consistent in this regard?
Ah, I see, yeah the optabs def have a stricter requirement than the
documentation
or the vector patterns have. I hadn't noticed that.
I'll double check the rest, but I propose dropping the $I then. It looks like
today the
vectorizer would just fail in verify_gimple.
Thanks,
Tamar
>
> Richard
>
> > Thanks,
> > Tamar
> >
> >>
> >> I do wonder whether it makes sense to differentiate between vector
> >> and non-vector modes in optabs.def and gen*, but that's a much
> >> larger task.
> >>
> >> Thanks,
> >> Richard.
> >>
> >>> Thanks,
> >>> Tamar
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> PR middle-end/122069
> >>> * doc/md.texi (widen_ssum@var{n}@var{m}3,
> >> widen_usum@var{n}@var{m}3):
> >>> Update docs.
> >>> * optabs.cc (expand_widen_pattern_expr): Add WIDEN_SUM_EXPR as
> >> widening.
> >>> * optabs.def (ssum_widen_optab, usum_widen_optab): Convert
> >> from direct
> >>> to a conversion optab.
> >>> * tree-vect-patterns.cc (vect_recog_widen_sum_pattern): Change
> >>> vect_supportable_direct_optab_p into
> >> vect_supportable_conv_optab_p.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> PR middle-end/122069
> >>> * gcc.dg/vect/slp-reduc-3.c: vect_widen_sum_hi_to_si_pattern
> >> targets now
> >>> pass.
> >>>
> >>> ---
> >>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >>> index
> >>
> 44e1149bea89b18903061713e8319d834b76adbf..97d21b90a650e5e5fad
> >> 5cd72b01f30983ca4ab43 100644
> >>> --- a/gcc/doc/md.texi
> >>> +++ b/gcc/doc/md.texi
> >>> @@ -5847,15 +5847,15 @@ equal or wider than the mode of the
> absolute
> >> difference. The result is placed
> >>> in operand 0, which is of the same mode as operand 3.
> >>> @var{m} is the mode of operand 1 and operand 2.
> >>>
> >>> -@cindex @code{widen_ssum@var{m}3} instruction pattern
> >>> -@cindex @code{widen_usum@var{m}3} instruction pattern
> >>> -@item @samp{widen_ssum@var{m}3}
> >>> -@itemx @samp{widen_usum@var{m}3}
> >>> +@cindex @code{widen_ssum@var{n}@var{m}3} instruction pattern
> >>> +@cindex @code{widen_usum@var{n}@var{m}3} instruction pattern
> >>> +@item @samp{widen_ssum@var{n}@var{m}3}
> >>> +@itemx @samp{widen_usum@var{n}@var{m}3}
> >>> Operands 0 and 2 are of the same mode, which is wider than the mode of
> >>> operand 1. Add operand 1 to operand 2 and place the widened result in
> >>> operand 0. (This is used express accumulation of elements into an
> >> accumulator
> >>> of a wider mode.)
> >>> -@var{m} is the mode of operand 1.
> >>> +@var{m} is the mode of operand 1 and @var{n} is the mode of operand
> 0.
> >>>
> >>> @cindex @code{smulhs@var{m}3} instruction pattern
> >>> @cindex @code{umulhs@var{m}3} instruction pattern
> >>> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> >>> index
> >>
> 5c9450f61450fa4425d08339a1c2b5f7f5e654ec..0865fc2e19aeb2b3056c86
> >> 34334d6c1644a3cc96 100644
> >>> --- a/gcc/optabs.cc
> >>> +++ b/gcc/optabs.cc
> >>> @@ -322,6 +322,10 @@ expand_widen_pattern_expr (const_sepops
> ops,
> >> rtx op0, rtx op1, rtx wide_op,
> >>> icode = find_widening_optab_handler (widen_pattern_optab,
> >>> TYPE_MODE (TREE_TYPE (ops-
> >>> op2)),
> >>> tmode0);
> >>> + else if (ops->code == WIDEN_SUM_EXPR)
> >>> + icode = find_widening_optab_handler (widen_pattern_optab,
> >>> + TYPE_MODE (TREE_TYPE (ops-
> >>> op1)),
> >>> + tmode0);
> >>> else
> >>> icode = optab_handler (widen_pattern_optab, tmode0);
> >>> gcc_assert (icode != CODE_FOR_nothing);
> >>> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >>> index
> >>
> 790e43f08f476c8025dc2797f9ecaffe5b66acc5..e2ffb2b6423893b5dd757af
> >> 1ed3f342ce8c9f76a 100644
> >>> --- a/gcc/optabs.def
> >>> +++ b/gcc/optabs.def
> >>> @@ -85,6 +85,8 @@ OPTAB_CD(smsub_widen_optab, "msub$b$a4")
> >>> OPTAB_CD(umsub_widen_optab, "umsub$b$a4")
> >>> OPTAB_CD(ssmsub_widen_optab, "ssmsub$b$a4")
> >>> OPTAB_CD(usmsub_widen_optab, "usmsub$a$b4")
> >>> +OPTAB_CD(ssum_widen_optab, "widen_ssum$I$a$b3")
> >>> +OPTAB_CD(usum_widen_optab, "widen_usum$I$a$b3")
> >>> OPTAB_CD(crc_optab, "crc$a$b4")
> >>> OPTAB_CD(crc_rev_optab, "crc_rev$a$b4")
> >>> OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
> >>> @@ -415,8 +417,6 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
> >>> OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
> >>> OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
> >>> OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> >>> -OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> >>> -OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> >>> OPTAB_D (usad_optab, "usad$I$a")
> >>> OPTAB_D (ssad_optab, "ssad$I$a")
> >>> OPTAB_D (smulhs_optab, "smulhs$a3")
> >>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> >> b/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> >>> index
> >>
> 614d8ad17ca1629af9f43cedec3cbed197d9a582..b8aff98990b202eae2a7c3
> >> 67457113aa1b811eda 100644
> >>> --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> >>> +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-3.c
> >>> @@ -60,6 +60,6 @@ int main (void)
> >>> /* The initialization loop in main also gets vectorized. */
> >>> /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> >> detected" 1 "vect" { xfail *-*-* } } } */
> >>> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {
> >>> target {
> >> vect_short_mult && { vect_widen_sum_hi_to_si && vect_unpack } } } } } */
> >>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
> "vect" {
> >> xfail { vect_widen_sum_hi_to_si_pattern || { ! { vect_short_mult && {
> >> vect_widen_sum_hi_to_si && vect_unpack } } } } } } } */
> >>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1
> "vect" {
> >> xfail { ! { vect_short_mult && { vect_widen_sum_hi_to_si && vect_unpack }
> } }
> >> } } } */
> >>> /* Check we can elide permutes if SLP vectorizing the reduction. */
> >>> /* { dg-final { scan-tree-dump-times " = VEC_PERM_EXPR" 0 "vect" { xfail {
> {
> >> { vect_widen_sum_hi_to_si_pattern || { ! vect_unpack } } && { !
> >> vect_load_lanes } } && { vect_short_mult && { vect_widen_sum_hi_to_si
> &&
> >> vect_unpack } } } } } } */
> >>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> >>> index
> >>
> 782327235db16384c2d71186911802daf7a15ebc..38695647f602792909c
> >> 486ae52a3fbf8cc28b39e 100644
> >>> --- a/gcc/tree-vect-patterns.cc
> >>> +++ b/gcc/tree-vect-patterns.cc
> >>> @@ -2544,8 +2544,8 @@ vect_recog_widen_sum_pattern (vec_info
> *vinfo,
> >>>
> >>> vect_pattern_detected ("vect_recog_widen_sum_pattern", last_stmt);
> >>>
> >>> - if (!vect_supportable_direct_optab_p (vinfo, type, WIDEN_SUM_EXPR,
> >>> - unprom0.type, type_out))
> >>> + if (!vect_supportable_conv_optab_p (vinfo, type, WIDEN_SUM_EXPR,
> >>> + unprom0.type, type_out))
> >>> return NULL;
> >>>
> >>> var = vect_recog_temp_ssa_var (type, NULL);
> >>>
> >>>
> >>>
> >>
> >> --
> >> Richard Biener <[email protected]>
> >> SUSE Software Solutions Germany GmbH,
> >> Frankenstrasse 146, 90461 Nuernberg, Germany;
> >> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> >> Nuernberg)