Re: [PATCH GCC17-stage1] tree-optimization: Add pass_widen_accumulator to widen narrow loop accumulators

Richard Biener Mon, 16 Mar 2026 01:06:23 -0700

On Sun, Mar 15, 2026 at 12:38 PM Philipp Tomsich
<[email protected]> wrote:
>
> On Sun, 15 Mar 2026 at 11:35, Richard Biener <[email protected]> 
> wrote:
> >
> > On Fri, Mar 13, 2026 at 10:39 PM Philipp Tomsich
> > <[email protected]> wrote:
> > >
> > > Add a new SSA pass (pass_widen_accum) that widens narrow integer
> > > loop accumulators (e.g. short, char) to int-width, eliminating
> > > per-iteration sign-/zero-extension truncations.
> > >
> > > The pass is gated on -ftree-widen-accum, enabled at -O2 and above.
> >
> > Few quick comments, not a thorough review.
> >
> > First I wonder why this is not part of the widening_mul pass.
>
> We had originally (from our internal ticket) considered
> pass_optimize_widening_mul (which has a similar name), the
> vect_recog_widen_sum_pattern (which does the inverse of what we do),
> and a few more sharing opportunities. Each of those was rejected for
> one reason or another.
>
> For the pass_optimize_widening_mul, we found merging impractical.
> They share a scheduling slot in the pass pipeline and the "widening"
> in the names, but little else:
>
> 1. Fundamentally different iteration models.
> 1.a. widen_accumulator is loop-centric: it iterates over loops from
> innermost outward;
> 1.b. widening_mul is statement-centric: it walks the dominator tree
> visiting every BB. widening_mul has no concept of loops, headers, or
> back edges.
> 2. Loop infrastructure mismatch: widen_accumulator requires loop
> infrastructure: merging would force would force
> loop_optimizer_init(LOOPS_NORMAL) and loop_optimizer_finalize() into
> every invocation of widening_mul
> 3. Different SSA/CFG modification patterns
> 4. no shared analysis or transformation logic
> 4.a. widening_mul: MULT_EXPR, WIDEN_MULT_EXPR, FMA, divmod, saturation
> arithmetic, bswap
> 4.b widen_accumulator: PLUS_EXPR/MINUS_EXPR in loop-header PHI
> accumulator chains


OK, thanks for explaining, the "because it uses LC SSA" was confusing.

> On the remaining findings: we'll address for a v2.
>
> Thank you for the review,
> --Philipp
>
> > > gcc/ChangeLog:
> > >
> > >         * common.opt (ftree-widen-accum): New flag.
> > >         * opts.cc (default_options_table): Enable at -O2+.
> > >         * params.opt (max-widen-accum-chain-depth): New param.
> > >         * tree-ssa-loop-widen-accum.cc: New file.
> > >         * Makefile.in (OBJS): Add tree-ssa-loop-widen-accum.o.
> > >         * passes.def (pass_widen_accumulator): Schedule after phiopt,
> > >         before widening_mul which needs LCSSA.
> > >         * timevar.def (TV_TREE_WIDEN_ACCUM): New timevar.
> > >         * tree-pass.h (make_pass_widen_accumulator): Declare.
> > >         * doc/invoke.texi (-ftree-widen-accum): Document new flag.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/tree-ssa/widen-accum-1.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-2.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-3.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-4.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-5.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-6.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-7.c: New test.
> > >         * gcc.dg/tree-ssa/widen-accum-8.c: New test.
> > >         * gcc.target/riscv/widen-accum-1.c: New test.
> > >         * gcc.target/aarch64/widen-accum-1.c: New test.
> > > ---
> > >  gcc/Makefile.in                               |   1 +
> > >  gcc/common.opt                                |   4 +
> > >  gcc/doc/invoke.texi                           |  14 +-
> > >  gcc/opts.cc                                   |   1 +
> > >  gcc/params.opt                                |   4 +
> > >  gcc/passes.def                                |   1 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-1.c |  20 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-2.c |  19 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-3.c |  26 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-4.c |  20 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-5.c |  33 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-6.c |  15 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-7.c |  15 +
> > >  gcc/testsuite/gcc.dg/tree-ssa/widen-accum-8.c |  15 +
> > >  .../gcc.target/aarch64/widen-accum-1.c        |  15 +
> > >  .../gcc.target/riscv/widen-accum-1.c          |  15 +
> > >  gcc/timevar.def                               |   1 +
> > >  gcc/tree-pass.h                               |   1 +
> > >  gcc/tree-ssa-loop-widen-accum.cc              | 713 ++++++++++++++++++
> > >  19 files changed, 932 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-2.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-3.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-4.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-5.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-6.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-7.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/widen-accum-8.c
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/widen-accum-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/widen-accum-1.c
> > >  create mode 100644 gcc/tree-ssa-loop-widen-accum.cc
> > >
> > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > index 9e8da255186a..c94a8a0a5940 100644
> > > --- a/gcc/Makefile.in
> > > +++ b/gcc/Makefile.in
> > > @@ -1799,6 +1799,7 @@ OBJS = \
> > >         tree-ssa-loop-prefetch.o \
> > >         tree-ssa-loop-split.o \
> > >         tree-ssa-loop-unswitch.o \
> > > +       tree-ssa-loop-widen-accum.o \
> > >         tree-ssa-loop.o \
> > >         tree-ssa-math-opts.o \
> > >         tree-ssa-operands.o \
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index 88b79bbf8f56..bd3e2707ab4c 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3380,6 +3380,10 @@ ftree-vrp
> > >  Common Var(flag_tree_vrp) Init(0) Optimization
> > >  Perform Value Range Propagation on trees.
> > >
> > > +ftree-widen-accum
> > > +Common Var(flag_tree_widen_accum) Optimization
> > > +Widen narrow-type loop accumulators to avoid per-iteration truncations.
> > > +
> > >  fsplit-paths
> > >  Common Var(flag_split_paths) Init(0) Optimization
> > >  Split paths leading to loop backedges.
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index fe20ae66c00b..d38333b59c4d 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -688,7 +688,7 @@ Objective-C and Objective-C++ Dialects}.
> > >  -ftree-parallelize-loops[=@var{n}]  -ftree-pre  -ftree-partial-pre  
> > > -ftree-pta
> > >  -ftree-reassoc  -ftree-scev-cprop  -ftree-sink  -ftree-slsr  -ftree-sra
> > >  -ftree-switch-conversion  -ftree-tail-merge
> > > --ftree-ter  -ftree-vectorize  -ftree-vrp  -ftrivial-auto-var-init
> > > +-ftree-ter  -ftree-vectorize  -ftree-vrp  -ftree-widen-accum  
> > > -ftrivial-auto-var-init
> > >  -funconstrained-commons  -funit-at-a-time  -funroll-all-loops
> > >  -funroll-loops  -funsafe-math-optimizations  -funswitch-loops
> > >  -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt
> > > @@ -13463,6 +13463,7 @@ also turns on the following optimization flags:
> > >  -ftree-slp-vectorize
> > >  -ftree-switch-conversion  -ftree-tail-merge
> > >  -ftree-vrp
> > > +-ftree-widen-accum
> > >  -fvect-cost-model=very-cheap}
> > >
> > >  Please note the warning under @option{-fgcse} about
> > > @@ -15322,6 +15323,17 @@ enabled by default at @option{-O2} and higher.  
> > > Null pointer check
> > >  elimination is only done if @option{-fdelete-null-pointer-checks} is
> > >  enabled.
> > >
> > > +@opindex ftree-widen-accum
> > > +@opindex fno-tree-widen-accum
> > > +@item -ftree-widen-accum
> > > +Widen narrow-type loop accumulators (e.g.@: @code{short}, @code{char})
> > > +to @code{int} width, eliminating per-iteration sign- or zero-extension
> > > +truncations.  Since two's-complement addition is associative modulo
> > > +@math{2^N}, the truncation can be deferred to loop exits without
> > > +changing the observable result.  This is enabled by default at
> > > +@option{-O2} and higher and is not applied when @option{-ftrapv} is
> > > +in effect.
> > > +
> > >  @opindex fsplit-paths
> > >  @opindex fno-split-paths
> > >  @item -fsplit-paths
> > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > index 6658b6acd378..2faab06879cd 100644
> > > --- a/gcc/opts.cc
> > > +++ b/gcc/opts.cc
> > > @@ -673,6 +673,7 @@ static const struct default_options 
> > > default_options_table[] =
> > >      { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
> > >      { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
> > >      { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
> > > +    { OPT_LEVELS_2_PLUS, OPT_ftree_widen_accum, NULL, 1 },
> > >      { OPT_LEVELS_2_PLUS, OPT_fvect_cost_model_, NULL,
> > >        VECT_COST_MODEL_VERY_CHEAP },
> > >      { OPT_LEVELS_2_PLUS, OPT_finline_functions, NULL, 1 },
> > > diff --git a/gcc/params.opt b/gcc/params.opt
> > > index 4420189e9822..305ed70a1256 100644
> > > --- a/gcc/params.opt
> > > +++ b/gcc/params.opt
> > > @@ -846,6 +846,10 @@ Maximum size of loc list for which reverse ops 
> > > should be added.
> > >  Common Joined UInteger Var(param_max_vartrack_size) Init(50000000) Param 
> > > Optimization
> > >  Maximum size of var tracking hash tables.
> > >
> > > +-param=max-widen-accum-chain-depth=
> > > +Common Joined UInteger Var(param_max_widen_accum_chain_depth) Init(50) 
> > > IntegerRange(1, 200) Param Optimization
> > > +Maximum recursion depth when analyzing or transforming accumulator 
> > > chains in the widen_accum pass.
> > > +
> > >  -param=max-find-base-term-values=
> > >  Common Joined UInteger Var(param_max_find_base_term_values) Init(200) 
> > > Param Optimization
> > >  Maximum number of VALUEs handled during a single find_base_term call.
> > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > index cdddb87302f6..3469dfd2e512 100644
> > > --- a/gcc/passes.def
> > > +++ b/gcc/passes.def
> > > @@ -371,6 +371,7 @@ along with GCC; see the file COPYING3.  If not see
> > >        NEXT_PASS (pass_forwprop, /*full_walk=*/false, /*last=*/true);
> > >        NEXT_PASS (pass_sink_code, true /* unsplit edges */);
> > >        NEXT_PASS (pass_phiopt, false /* early_p */);
> > > +      NEXT_PASS (pass_widen_accumulator); /* Before widening_mul which 
> > > needs LCSSA.  */
> > >        NEXT_PASS (pass_optimize_widening_mul);
> > >        NEXT_PASS (pass_store_merging);
> > >        /* If DCE is not run before checking for uninitialized uses,
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-1.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-1.c
> > > new file mode 100644
> > > index 000000000000..bfadad0201fc
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-1.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +typedef short ee_s16;
> > > +
> > > +ee_s16 __attribute__((noipa))
> > > +test_widen (int N, int *A, int val)
> > > +{
> > > +  ee_s16 ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    {
> > > +      if (A[i] > val)
> > > +        ret += 10;
> > > +      else
> > > +        ret += (A[i] > 0) ? 1 : 0;
> > > +    }
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-2.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-2.c
> > > new file mode 100644
> > > index 000000000000..5b3cecbec078
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-2.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +typedef short ee_s16;
> > > +
> > > +ee_s16 __attribute__((noipa))
> > > +test_no_widen (int N, int *A, ee_s16 limit)
> > > +{
> > > +  ee_s16 ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    {
> > > +      ret += A[i];
> > > +      if (ret > limit)  /* comparison of narrow accumulator -- blocks 
> > > widening */
> > > +        ret = 0;
> > > +    }
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-not "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-3.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-3.c
> > > new file mode 100644
> > > index 000000000000..24bbaa7f11cf
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-3.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +typedef short ee_s16;
> > > +
> > > +/* Multi-exit loop: the early return carries an intermediate
> > > +   accumulator value, not the back-edge value.  The pass should
> > > +   still widen successfully.  */
> > > +
> > > +ee_s16 __attribute__((noipa))
> > > +test_multi_exit (int N, int *A, int val, int sentinel)
> > > +{
> > > +  ee_s16 ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    {
> > > +      if (A[i] == sentinel)
> > > +        return ret;        /* early exit with current accumulator value 
> > > */
> > > +      if (A[i] > val)
> > > +        ret += 10;
> > > +      else
> > > +        ret += (A[i] > 0) ? 1 : 0;
> > > +    }
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-4.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-4.c
> > > new file mode 100644
> > > index 000000000000..48f1d739b5b8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-4.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +typedef short ee_s16;
> > > +
> > > +/* Accumulator with int-typed addend: GIMPLE keeps the explicit
> > > +   widen/compute/truncate chain (_1 = (int)ret; _2 = _1 + A[i];
> > > +   ret = (short)_2) because A[i] is int.  The pass must look
> > > +   through the int->short truncation in resolve_wide_back_arg.  */
> > > +
> > > +ee_s16 __attribute__((noipa))
> > > +test_int_addend (int N, int *A)
> > > +{
> > > +  ee_s16 ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret += A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-5.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-5.c
> > > new file mode 100644
> > > index 000000000000..b22c6732d4c7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-5.c
> > > @@ -0,0 +1,33 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +/* Verify that widening preserves wraparound semantics.  */
> > > +
> > > +short __attribute__((noipa))
> > > +sum_short (int N, int *A)
> > > +{
> > > +  short ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret += A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +int
> > > +main (void)
> > > +{
> > > +  int A[100];
> > > +  for (int i = 0; i < 100; i++)
> > > +    A[i] = 400;
> > > +
> > > +  /* 100 * 400 = 40000 which wraps past short (max 32767).
> > > +     Compute expected result via int, then truncate.  */
> > > +  int ref = 0;
> > > +  for (int i = 0; i < 100; i++)
> > > +    ref += 400;
> > > +  short expected = (short) ref;
> > > +
> > > +  if (sum_short (100, A) != expected)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-6.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-6.c
> > > new file mode 100644
> > > index 000000000000..ce1b527c3062
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-6.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +/* signed char accumulator.  */
> > > +
> > > +signed char __attribute__((noipa))
> > > +sum_char (int N, int *A)
> > > +{
> > > +  signed char ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret += A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-7.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-7.c
> > > new file mode 100644
> > > index 000000000000..292f0812f178
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-7.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +/* unsigned short accumulator.  */
> > > +
> > > +unsigned short __attribute__((noipa))
> > > +sum_ushort (int N, int *A)
> > > +{
> > > +  unsigned short ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret += A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-8.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-8.c
> > > new file mode 100644
> > > index 000000000000..8c4e5fc2f9b8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/widen-accum-8.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-widen_accum-details" } */
> > > +
> > > +/* MINUS_EXPR accumulator.  */
> > > +
> > > +short __attribute__((noipa))
> > > +sub_short (int N, int *A)
> > > +{
> > > +  short ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret -= A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Accumulator widened successfully" 
> > > "widen_accum" } } */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/widen-accum-1.c 
> > > b/gcc/testsuite/gcc.target/aarch64/widen-accum-1.c
> > > new file mode 100644
> > > index 000000000000..b5c3841fb832
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/widen-accum-1.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Os" "-Oz" "-Og" } } */
> > > +
> > > +short __attribute__((noipa))
> > > +sum_short (int N, int *A)
> > > +{
> > > +  short ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret += A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +/* After widening, the loop should not need sign-extension.  */
> > > +/* { dg-final { scan-assembler-not "sxth" } } */
> > > diff --git a/gcc/testsuite/gcc.target/riscv/widen-accum-1.c 
> > > b/gcc/testsuite/gcc.target/riscv/widen-accum-1.c
> > > new file mode 100644
> > > index 000000000000..66693abecab7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/widen-accum-1.c
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -march=rv64gc -mabi=lp64d" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Os" "-Oz" "-Og" } } */
> > > +
> > > +short __attribute__((noipa))
> > > +sum_short (int N, int *A)
> > > +{
> > > +  short ret = 0;
> > > +  for (int i = 0; i < N; i++)
> > > +    ret += A[i];
> > > +  return ret;
> > > +}
> > > +
> > > +/* After widening, the loop should not need sign-extension.  */
> > > +/* { dg-final { scan-assembler-not "sext\\.h" } } */
> > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > index 3824caa01bc2..78db42f94e8b 100644
> > > --- a/gcc/timevar.def
> > > +++ b/gcc/timevar.def
> > > @@ -227,6 +227,7 @@ DEFTIMEVAR (TV_TREE_SWITCH_LOWERING,   "tree switch 
> > > lowering")
> > >  DEFTIMEVAR (TV_TREE_RECIP            , "gimple CSE reciprocals")
> > >  DEFTIMEVAR (TV_TREE_SINCOS           , "gimple CSE sin/cos")
> > >  DEFTIMEVAR (TV_TREE_POW              , "gimple expand pow")
> > > +DEFTIMEVAR (TV_TREE_WIDEN_ACCUM      , "gimple widen accumulator")
> > >  DEFTIMEVAR (TV_TREE_WIDEN_MUL        , "gimple widening/fma detection")
> > >  DEFTIMEVAR (TV_TRANS_MEM             , "transactional memory")
> > >  DEFTIMEVAR (TV_TREE_STRLEN           , "tree strlen optimization")
> > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > index b3c97658a8fe..96fc35b2a76c 100644
> > > --- a/gcc/tree-pass.h
> > > +++ b/gcc/tree-pass.h
> > > @@ -456,6 +456,7 @@ extern gimple_opt_pass *make_pass_cse_sincos 
> > > (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_expand_pow (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt);
> > > +extern gimple_opt_pass *make_pass_widen_accumulator (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context 
> > > *ctxt);
> > >  extern gimple_opt_pass *make_pass_warn_function_return (gcc::context 
> > > *ctxt);
> > >  extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context 
> > > *ctxt);
> > > diff --git a/gcc/tree-ssa-loop-widen-accum.cc 
> > > b/gcc/tree-ssa-loop-widen-accum.cc
> > > new file mode 100644
> > > index 000000000000..4f41ed1bedaa
> > > --- /dev/null
> > > +++ b/gcc/tree-ssa-loop-widen-accum.cc
> > > @@ -0,0 +1,713 @@
> > > +/* Widen narrow-type loop accumulators to int.
> > > +   Copyright (C) 2026 Free Software Foundation, Inc.
> > > +
> > > +This file is part of GCC.
> > > +
> > > +GCC is free software; you can redistribute it and/or modify it under
> > > +the terms of the GNU General Public License as published by the Free
> > > +Software Foundation; either version 3, or (at your option) any later
> > > +version.
> > > +
> > > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > +for more details.
> > > +
> > > +You should have received a copy of the GNU General Public License
> > > +along with GCC; see the file COPYING3.  If not see
> > > +<http://www.gnu.org/licenses/>.  */
> > > +
> > > +/* Narrow-type loop accumulators (e.g. short) are truncated every
> > > +   iteration, producing redundant sign/zero-extensions on targets such
> > > +   as RISC-V.  Since two's-complement addition is associative mod 2^N,
> > > +   the truncation can be deferred to loop exits.
> > > +
> > > +   This pass finds header PHIs whose type is narrower than int and
> > > +   whose in-loop uses are limited to additive operations and
> > > +   same-width casts.  It creates a widened (int-typed) copy of the
> > > +   accumulator chain and inserts a single narrowing cast at each loop
> > > +   exit.  */
> > > +
> > > +#include "config.h"
> > > +#include "system.h"
> > > +#include "coretypes.h"
> > > +#include "backend.h"
> > > +#include "tree.h"
> > > +#include "gimple.h"
> > > +#include "tree-pass.h"
> > > +#include "ssa.h"
> > > +#include "gimple-pretty-print.h"
> > > +#include "fold-const.h"
> > > +#include "gimple-iterator.h"
> > > +#include "tree-cfg.h"
> > > +#include "tree-ssa-loop-manip.h"
> > > +#include "tree-ssa-loop.h"
> > > +#include "cfgloop.h"
> > > +#include "tree-dfa.h"
> > > +#include "tree-ssa.h"
> > > +#include "tree-phinodes.h"
> > > +#include "tree-into-ssa.h"
> > > +
> > > +/* Return true if CODE is an additive operation or a type conversion
> > > +   -- the set of operations that the accumulator chain is allowed to
> > > +   contain.  Shared between verify_chain_to_phi_p (backward walk) and
> > > +   validate_uses_in_loop_p (forward walk) to keep them in sync.  */
> > > +
> > > +static inline bool
> > > +is_additive_or_cast_p (enum tree_code code)
> > > +{
> > > +  return CONVERT_EXPR_CODE_P (code)
> > > +        || code == PLUS_EXPR
> > > +        || code == MINUS_EXPR;
> > > +}
> > > +
> > > +/* Return true if STMT is a CONVERT_EXPR/NOP_EXPR whose input has
> > > +   precision <= NARROW_PREC and whose output has precision >= input.
> > > +   This matches widening casts and same-width sign-changes.  */
> > > +
> > > +static bool
> > > +is_widening_or_nop_cast_p (gimple *stmt, unsigned narrow_prec)
> > > +{
> > > +  if (!is_gimple_assign (stmt))
> > > +    return false;
> > > +  if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
> > > +    return false;
> > > +  tree rhs = gimple_assign_rhs1 (stmt);
> > > +  tree lhs = gimple_assign_lhs (stmt);
> > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> > > +      || !INTEGRAL_TYPE_P (TREE_TYPE (rhs)))
> > > +    return false;
> > > +  return (TYPE_PRECISION (TREE_TYPE (rhs)) <= narrow_prec
> > > +         && TYPE_PRECISION (TREE_TYPE (lhs))
> > > +            >= TYPE_PRECISION (TREE_TYPE (rhs)));
> > > +}
> > > +
> > > +/* Walk backward from NAME through CONVERT_EXPR / PLUS_EXPR / MINUS_EXPR
> > > +   and merge-PHIs, verifying the chain reaches PHI_RESULT within LOOP.
> > > +   UNDO collects nodes added to VISITED so that a failed speculative
> > > +   walk (trying op0 before op1) can be rolled back cheaply.  */
> > > +
> > > +static bool
> > > +verify_chain_to_phi_p (tree name, tree phi_result, class loop *loop,
> > > +                      hash_set<tree> &visited,
> > > +                      auto_vec<tree> &undo, int depth = 0)
> > > +{
> > > +  /* Iteratively follow CONVERT_EXPR chains to avoid recursion on
> > > +     the common linear-cast case; recurse for PHIs and PLUS/MINUS.
> > > +     The depth parameter bounds total recursion across all frames.  */
> > > +  for (;;)
> > > +    {
> > > +      if (depth > param_max_widen_accum_chain_depth)
> > > +       return false;
> >
> > Possibly limit this together with (*)
> >
> > > +
> > > +      if (name == phi_result)
> > > +       return true;
> > > +
> > > +      if (TREE_CODE (name) != SSA_NAME)
> > > +       return false;
> > > +
> > > +      /* Already verified on a convergent path -- ok.  In-loop merge
> > > +        PHIs cannot form cycles without going through the header
> > > +        PHI, so a previously visited node is safe.  */
> >
> > There's irreducible regions, so I don't think this is true.
> >
> > > +      if (visited.add (name))
> > > +       return true;
> > > +      undo.safe_push (name);
> > > +
> > > +      gimple *def = SSA_NAME_DEF_STMT (name);
> > > +      if (!def)
> > > +       return false;
> > > +
> > > +      basic_block bb = gimple_bb (def);
> > > +      if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > > +       return false;
> > > +
> > > +      /* Merge-PHI inside the loop (not header): each in-loop
> > > +        argument must trace back to phi_result.  */
> > > +      if (gimple_code (def) == GIMPLE_PHI)
> > > +       {
> > > +         if (bb == loop->header)
> > > +           return false;
> > > +         gphi *phi = as_a<gphi *> (def);
> > > +         for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
> > > +           {
> > > +             edge e = gimple_phi_arg_edge (phi, i);
> > > +             if (!flow_bb_inside_loop_p (loop, e->src))
> > > +               continue;
> > > +             tree arg = gimple_phi_arg_def (phi, i);
> > > +             if (!verify_chain_to_phi_p (arg, phi_result, loop, visited,
> > > +                                         undo, depth + 1))
> > > +               return false;
> > > +           }
> > > +         return true;
> > > +       }
> > > +
> > > +      if (!is_gimple_assign (def))
> > > +       return false;
> > > +
> > > +      enum tree_code code = gimple_assign_rhs_code (def);
> > > +      if (!is_additive_or_cast_p (code))
> > > +       return false;
> > > +
> > > +      if (CONVERT_EXPR_CODE_P (code))
> > > +       {
> > > +         name = gimple_assign_rhs1 (def);
> > > +         depth++;
> > > +         continue;
> > > +       }
> > > +
> > > +      /* PLUS_EXPR / MINUS_EXPR: one operand is the accumulator chain,
> > > +        the other is a non-accumulator addend.  Try op0 first; on
> > > +        failure, roll back any nodes it added to VISITED before
> > > +        trying op1 -- a polluted set could cause a false-positive
> > > +        convergent-path hit.  */
> > > +      tree op0 = gimple_assign_rhs1 (def);
> > > +      tree op1 = gimple_assign_rhs2 (def);
> > > +      unsigned mark = undo.length ();
> > > +      if (verify_chain_to_phi_p (op0, phi_result, loop, visited,
> > > +                                undo, depth + 1))
> > > +       return true;
> > > +      /* Roll back nodes added by the failed op0 walk.  */
> > > +      while (undo.length () > mark)
> > > +       visited.remove (undo.pop ());
> > > +      return verify_chain_to_phi_p (op1, phi_result, loop, visited,
> > > +                                   undo, depth + 1);
> > > +    }
> > > +}
> > > +
> > > +/* Validate that all in-loop uses of NAME are safe for widening:
> > > +   PHIs, casts, or additive ops.  Out-of-loop uses are fine since
> > > +   they will get the exit-narrowed value.  */
> > > +
> > > +static bool
> > > +validate_uses_in_loop_p (tree name, class loop *loop,
> > > +                        hash_set<tree> &visited, int depth = 0)
> > > +{
> > > +  if (depth > param_max_widen_accum_chain_depth)
> > > +    return false;
> > > +
> > > +  if (visited.add (name))
> > > +    return true;
> > > +
> > > +  imm_use_iterator iter;
> > > +  use_operand_p use_p;
> > > +  FOR_EACH_IMM_USE_FAST (use_p, iter, name)
> > > +    {
> > > +      gimple *use_stmt = USE_STMT (use_p);
> > > +
> > > +      if (is_gimple_debug (use_stmt))
> > > +       continue;
> > > +
> > > +      basic_block bb = gimple_bb (use_stmt);
> > > +      if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > > +       continue;  /* Out-of-loop use -- ok.  */
> > > +
> > > +      if (gimple_code (use_stmt) == GIMPLE_PHI)
> > > +       {
> > > +         /* For in-loop merge PHIs, recurse on the result.  */
> > > +         if (bb != loop->header)
> > > +           {
> > > +             tree phi_res = gimple_phi_result (use_stmt);
> > > +             if (!validate_uses_in_loop_p (phi_res, loop, visited,
> > > +                                           depth + 1))
> > > +               return false;
> > > +           }
> > > +         continue;
> > > +       }
> > > +
> > > +      if (!is_gimple_assign (use_stmt))
> > > +       return false;
> > > +
> > > +      enum tree_code code = gimple_assign_rhs_code (use_stmt);
> > > +      if (!is_additive_or_cast_p (code))
> > > +       return false;
> > > +
> > > +      tree lhs = gimple_assign_lhs (use_stmt);
> > > +      if (!validate_uses_in_loop_p (lhs, loop, visited, depth + 1))
> > > +       return false;
> > > +    }
> > > +  return true;
> > > +}
> > > +
> > > +/* Extract the preheader (init) and latch (back-edge) arguments from
> > > +   a two-argument header PHI.  Returns true on success.  */
> > > +
> > > +static bool
> > > +get_phi_args (gphi *header_phi, class loop *loop,
> > > +             tree *init_arg_out, tree *back_arg_out,
> > > +             location_t *init_loc_out = NULL,
> > > +             location_t *back_loc_out = NULL)
> > > +{
> > > +  if (gimple_phi_num_args (header_phi) != 2)
> > > +    return false;
> > > +
> > > +  edge pre_edge = loop_preheader_edge (loop);
> > > +  edge latch_edge = loop_latch_edge (loop);
> > > +
> > > +  *init_arg_out = NULL_TREE;
> > > +  *back_arg_out = NULL_TREE;
> > > +
> > > +  for (unsigned i = 0; i < 2; i++)
> > > +    {
> > > +      edge e = gimple_phi_arg_edge (header_phi, i);
> > > +      if (e == pre_edge)
> > > +       {
> > > +         *init_arg_out = gimple_phi_arg_def (header_phi, i);
> > > +         if (init_loc_out)
> > > +           *init_loc_out = gimple_phi_arg_location (header_phi, i);
> > > +       }
> > > +      else if (e == latch_edge)
> > > +       {
> > > +         *back_arg_out = gimple_phi_arg_def (header_phi, i);
> > > +         if (back_loc_out)
> > > +           *back_loc_out = gimple_phi_arg_location (header_phi, i);
> > > +       }
> > > +    }
> >
> > Huh, I think gimple_phi_arg_def_from_edge should be able to
> > simplify this.  Also edge->dest->index is the PHI argument index.
> >
> > > +
> > > +  return *init_arg_out && *back_arg_out;
> > > +}
> > > +
> > > +/* Top-level analysis for a candidate header PHI.
> > > +   Returns true if the PHI is a narrow accumulator that can be
> > > +   widened.  The caller has already verified the PHI result is a
> > > +   narrow integral type.  */
> > > +
> > > +static bool
> > > +analyze_candidate (gphi *header_phi, class loop *loop)
> > > +{
> > > +  tree phi_result = gimple_phi_result (header_phi);
> > > +
> > > +  tree init_arg, back_arg;
> > > +  if (!get_phi_args (header_phi, loop, &init_arg, &back_arg))
> > > +    return false;
> > > +
> > > +  /* Verify the back-edge argument chains back to the phi_result
> > > +     through additive ops, casts, and merge-PHIs.  */
> > > +  hash_set<tree> chain_visited;
> > > +  auto_vec<tree> undo;
> > > +  if (!verify_chain_to_phi_p (back_arg, phi_result, loop, chain_visited,
> > > +                             undo))
> > > +    {
> > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > +       fprintf (dump_file, "  reject: back-edge chain does not reach 
> > > PHI\n");
> > > +      return false;
> > > +    }
> > > +
> > > +  /* Validate that all in-loop uses of the phi_result are safe.  */
> > > +  hash_set<tree> use_visited;
> > > +  if (!validate_uses_in_loop_p (phi_result, loop, use_visited))
> > > +    {
> > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > +       fprintf (dump_file, "  reject: unsafe in-loop uses\n");
> > > +      return false;
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* If OPERAND already has a wide mapping in NARROW_TO_WIDE, return it.
> > > +   If OPERAND is an INTEGER_CST, return fold_convert to WIDE_TYPE.
> > > +   Otherwise, insert a widening NOP_EXPR before GSI and return the
> > > +   new wide SSA name.  */
> > > +
> > > +static tree
> > > +get_or_widen (tree operand, tree wide_type,
> > > +             hash_map<tree, tree> &narrow_to_wide,
> > > +             gimple_stmt_iterator &gsi)
> > > +{
> > > +  tree *mapped = narrow_to_wide.get (operand);
> > > +  if (mapped)
> > > +    return *mapped;
> > > +
> > > +  if (TREE_CODE (operand) == INTEGER_CST)
> > > +    return fold_convert (wide_type, operand);
> > > +
> > > +  /* Insert a widening cast.  */
> > > +  tree wide_name = make_ssa_name (wide_type);
> > > +  gassign *widen_stmt = gimple_build_assign (wide_name, NOP_EXPR, 
> > > operand);
> > > +  gsi_insert_before (&gsi, widen_stmt, GSI_SAME_STMT);
> > > +  return wide_name;
> > > +}
> > > +
> > > +/* Recursively resolve the wide-type version of the back-edge PHI
> > > +   argument.  Handles:
> > > +   - Names already in narrow_to_wide (direct lookup)
> > > +   - Type conversions of any width (look through to source)
> > > +   - Merge-PHIs inside the loop (create wide merge-PHIs)  */
> > > +
> > > +static tree
> > > +resolve_wide_back_arg (tree back_arg, tree wide_type,
> > > +                      hash_map<tree, tree> &narrow_to_wide,
> > > +                      class loop *loop, int depth = 0)
> > > +{
> > > +  if (depth > param_max_widen_accum_chain_depth)
> > > +    return NULL_TREE;
> > > +
> > > +  if (TREE_CODE (back_arg) != SSA_NAME)
> > > +    return NULL_TREE;
> > > +
> > > +  tree *mapped = narrow_to_wide.get (back_arg);
> > > +  if (mapped)
> > > +    return *mapped;
> > > +
> > > +  gimple *def = SSA_NAME_DEF_STMT (back_arg);
> > > +  if (!def)
> > > +    return NULL_TREE;
> > > +
> > > +  /* Any type conversion (same-width sign-change, narrowing truncation,
> > > +     or widening cast): look through to the source operand.  This
> > > +     handles the int->short truncation that the worklist skips, as well
> > > +     as same-width casts like signed short <-> unsigned short.  The
> > > +     recursion will eventually reach a name in narrow_to_wide.  */
> > > +  if (is_gimple_assign (def)
> > > +      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def)))
> > > +    {
> > > +      tree src = gimple_assign_rhs1 (def);
> > > +      return resolve_wide_back_arg (src, wide_type,
> > > +                                   narrow_to_wide, loop, depth + 1);
> > > +    }
> > > +
> > > +  /* Merge-PHI inside the loop.  */
> > > +  if (gimple_code (def) == GIMPLE_PHI)
> > > +    {
> > > +      gphi *merge_phi = as_a<gphi *> (def);
> > > +      basic_block bb = gimple_bb (merge_phi);
> > > +      if (!bb || !flow_bb_inside_loop_p (loop, bb) || bb == loop->header)
> > > +       return NULL_TREE;
> > > +
> > > +      tree wide_phi_result = make_ssa_name (wide_type);
> > > +      gphi *new_phi = create_phi_node (wide_phi_result, bb);
> > > +
> > > +      /* Map before recursion to break cycles.  */
> > > +      narrow_to_wide.put (back_arg, wide_phi_result);
> > > +
> > > +      for (unsigned i = 0; i < gimple_phi_num_args (merge_phi); i++)
> > > +       {
> > > +         tree arg = gimple_phi_arg_def (merge_phi, i);
> > > +         edge e = gimple_phi_arg_edge (merge_phi, i);
> > > +         tree wide_arg = resolve_wide_back_arg (arg, wide_type,
> > > +                                                narrow_to_wide, loop,
> > > +                                                depth + 1);
> > > +         if (!wide_arg)
> > > +           {
> > > +             /* Fall back to widening the narrow arg on this edge.  */
> > > +             wide_arg = make_ssa_name (wide_type);
> > > +             gassign *cast_stmt
> > > +               = gimple_build_assign (wide_arg, NOP_EXPR, arg);
> > > +             gsi_insert_on_edge (e, cast_stmt);
> > > +           }
> > > +         add_phi_arg (new_phi, wide_arg, e, UNKNOWN_LOCATION);
> > > +       }
> > > +      return wide_phi_result;
> > > +    }
> > > +
> > > +  return NULL_TREE;
> > > +}
> > > +
> > > +/* Perform the widening transformation on HEADER_PHI.
> > > +   Returns true on success.  */
> > > +
> > > +static bool
> > > +widen_accumulator (gphi *header_phi, class loop *loop)
> > > +{
> > > +  tree phi_result = gimple_phi_result (header_phi);
> > > +  tree narrow_type = TREE_TYPE (phi_result);
> > > +  unsigned narrow_prec = TYPE_PRECISION (narrow_type);
> > > +  /* Use unsigned to ensure the wide addition wraps on overflow,
> > > +     matching the wrapping semantics of the original narrow-type
> > > +     arithmetic.  Using signed int would introduce UB that did not
> > > +     exist in the original program.  */
> > > +  tree wide_type = unsigned_type_node;
> > > +
> > > +  /* Ensure we actually widen.  */
> > > +  if (TYPE_PRECISION (wide_type) <= narrow_prec)
> > > +    return false;
> > > +
> > > +  tree init_arg, back_arg;
> > > +  location_t init_loc, back_loc;
> > > +  if (!get_phi_args (header_phi, loop, &init_arg, &back_arg,
> > > +                    &init_loc, &back_loc))
> > > +    return false;
> > > +
> > > +  edge pre_edge = loop_preheader_edge (loop);
> > > +  edge latch_edge = loop_latch_edge (loop);
> > > +
> > > +  /* 1. Create widened init on the preheader edge.  */
> > > +  tree wide_init = make_ssa_name (wide_type);
> > > +  gassign *init_cast = gimple_build_assign (wide_init, NOP_EXPR, 
> > > init_arg);
> > > +  gsi_insert_on_edge (pre_edge, init_cast);
> > > +
> > > +  /* 2. Create the new wide PHI in the loop header.
> > > +     Add the preheader arg now; the back-edge arg is added later.  */
> > > +  tree wide_phi_result = make_ssa_name (wide_type);
> > > +  gphi *wide_phi = create_phi_node (wide_phi_result, loop->header);
> > > +  add_phi_arg (wide_phi, wide_init, pre_edge, init_loc);
> > > +
> > > +  /* 3. Map old phi_result -> wide_phi_result.  */
> > > +  hash_map<tree, tree> narrow_to_wide;
> > > +  narrow_to_wide.put (phi_result, wide_phi_result);
> > > +
> > > +  /* 4. Worklist-driven widening of the in-loop chain.
> > > +     The original narrow statements are left in place; they become
> > > +     dead once exit PHIs are patched below and are removed by DCE.  */
> > > +  auto_vec<tree> worklist;
> > > +  worklist.safe_push (phi_result);
> > > +
> > > +  while (!worklist.is_empty ())
> > > +    {
> > > +      tree old_name = worklist.pop ();
> > > +      tree *wide_name_p = narrow_to_wide.get (old_name);
> > > +      if (!wide_name_p)
> > > +       continue;
> > > +      tree wide_name = *wide_name_p;
> > > +
> > > +      imm_use_iterator iter;
> > > +      use_operand_p use_p;
> > > +      /* Collect uses first to avoid issues with iteration.
> > > +        Use a hash set to deduplicate -- a statement may appear
> > > +        more than once if it uses old_name in multiple operands.  */
> > > +      hash_set<gimple *> seen_stmts;
> > > +      auto_vec<gimple *> use_stmts;
> > > +      FOR_EACH_IMM_USE_FAST (use_p, iter, old_name)
> > > +       {
> > > +         gimple *use_stmt = USE_STMT (use_p);
> > > +         if (is_gimple_debug (use_stmt))
> > > +           continue;
> > > +         basic_block bb = gimple_bb (use_stmt);
> > > +         if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > > +           continue;
> > > +         if (!seen_stmts.add (use_stmt))
> > > +           use_stmts.safe_push (use_stmt);
> > > +       }
> > > +
> > > +      unsigned i;
> > > +      gimple *use_stmt;
> > > +      FOR_EACH_VEC_ELT (use_stmts, i, use_stmt)
> > > +       {
> > > +         /* Skip merge PHIs -- handled by resolve_wide_back_arg.  */
> > > +         if (gimple_code (use_stmt) == GIMPLE_PHI)
> > > +           continue;
> > > +
> > > +         if (!is_gimple_assign (use_stmt))
> > > +           continue;
> > > +
> > > +         enum tree_code code = gimple_assign_rhs_code (use_stmt);
> > > +         tree lhs = gimple_assign_lhs (use_stmt);
> > > +
> > > +         if (CONVERT_EXPR_CODE_P (code))
> > > +           {
> > > +             /* Widening or same-width cast: map lhs -> wide_name.
> > > +                This absorbs same-width sign-changes (e.g. short <->
> > > +                unsigned short) by mapping them to the same wide SSA.  */
> > > +             if (is_widening_or_nop_cast_p (use_stmt, narrow_prec))
> > > +               {
> > > +                 if (!narrow_to_wide.get (lhs))
> > > +                   {
> > > +                     narrow_to_wide.put (lhs, wide_name);
> > > +                     worklist.safe_push (lhs);
> > > +                   }
> > > +                 continue;
> > > +               }
> > > +             /* Any other cast (truly narrowing from wider type, etc.)
> > > +                -- skip; the value won't be widened.  */
> > > +             continue;
> > > +           }
> > > +
> > > +         if (code == PLUS_EXPR || code == MINUS_EXPR)
> > > +           {
> > > +             if (narrow_to_wide.get (lhs))
> > > +               continue;  /* Already widened.  */
> > > +
> > > +             tree op0 = gimple_assign_rhs1 (use_stmt);
> > > +             tree op1 = gimple_assign_rhs2 (use_stmt);
> > > +
> > > +             gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> > > +
> > > +             tree w0 = get_or_widen (op0, wide_type, narrow_to_wide, 
> > > gsi);
> > > +             tree w1 = get_or_widen (op1, wide_type, narrow_to_wide, 
> > > gsi);
> > > +
> > > +             tree wide_lhs = make_ssa_name (wide_type);
> > > +             gassign *wide_stmt
> > > +               = gimple_build_assign (wide_lhs, code, w0, w1);
> > > +             gsi_insert_after (&gsi, wide_stmt, GSI_NEW_STMT);
> > > +
> > > +             narrow_to_wide.put (lhs, wide_lhs);
> > > +             worklist.safe_push (lhs);
> > > +             continue;
> > > +           }
> > > +       }
> > > +    }
> > > +
> > > +  /* 5. Wire the back-edge argument.  */
> > > +  tree wide_back = resolve_wide_back_arg (back_arg, wide_type,
> > > +                                         narrow_to_wide, loop);
> > > +  /* Analysis guarantees the back-edge chain reaches the header PHI
> > > +     through additive ops, casts, and merge-PHIs -- all of which
> > > +     resolve_wide_back_arg handles.  A NULL here means a bug.  */
> > > +  gcc_assert (wide_back);
> > > +  add_phi_arg (wide_phi, wide_back, latch_edge, back_loc);
> > > +
> > > +  /* Ensure back_arg is in the map so exit PHI replacement finds it.
> > > +     resolve_wide_back_arg maps intermediate nodes (merge-PHIs, cast
> > > +     sources) but not back_arg itself.  */
> > > +  if (!narrow_to_wide.get (back_arg))
> > > +    narrow_to_wide.put (back_arg, wide_back);
> > > +
> > > +  /* 6. Insert narrowing casts at loop exits.
> > > +     For each exit edge, scan its destination's PHIs for arguments
> > > +     that reference any name in the widened accumulator chain.  For
> > > +     each match, insert a narrowing cast of the corresponding wide
> > > +     value on that edge.  This handles both the back-edge value and
> > > +     intermediate accumulator values at early exits.  */
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +  unsigned j;
> > > +  edge ex;
> > > +  FOR_EACH_VEC_ELT (exits, j, ex)
> > > +    {
> > > +      if (ex->flags & (EDGE_ABNORMAL | EDGE_EH))
> > > +       continue;
> > > +
> > > +      for (gphi_iterator psi = gsi_start_phis (ex->dest);
> > > +          !gsi_end_p (psi); gsi_next (&psi))
> > > +       {
> > > +         gphi *exit_phi = psi.phi ();
> > > +         for (unsigned k = 0; k < gimple_phi_num_args (exit_phi); k++)
> > > +           {
> > > +             if (gimple_phi_arg_edge (exit_phi, k) != ex)
> > > +               continue;
> > > +             tree arg = gimple_phi_arg_def (exit_phi, k);
> > > +             tree arg_type = TREE_TYPE (arg);
> > > +             /* Only replace args that carry a narrow-type value.
> > > +                The narrow_to_wide map may also contain int-typed
> > > +                intermediates (from widening casts in the accumulator
> > > +                chain); replacing those with a narrow_type cast would
> > > +                create a type mismatch in the exit PHI.  */
> > > +             if (INTEGRAL_TYPE_P (arg_type)
> > > +                 && TYPE_PRECISION (arg_type) > narrow_prec)
> > > +               continue;
> > > +             tree *wide_p = narrow_to_wide.get (arg);
> > > +             if (!wide_p)
> > > +               continue;
> > > +
> > > +             tree narrow_exit = make_ssa_name (narrow_type);
> > > +             gassign *exit_cast
> > > +               = gimple_build_assign (narrow_exit, NOP_EXPR, *wide_p);
> > > +             gsi_insert_on_edge (ex, exit_cast);
> > > +             SET_PHI_ARG_DEF (exit_phi, k, narrow_exit);
> > > +           }
> > > +       }
> > > +    }
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +    {
> > > +      fprintf (dump_file, "Accumulator widened successfully: ");
> > > +      print_generic_expr (dump_file, phi_result, TDF_SLIM);
> > > +      fprintf (dump_file, " in loop %d\n", loop->num);
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +namespace {
> > > +
> > > +const pass_data pass_data_widen_accumulator =
> > > +{
> > > +  GIMPLE_PASS,             /* type */
> > > +  "widen_accum",           /* name */
> > > +  OPTGROUP_LOOP,           /* optinfo_flags */
> > > +  TV_TREE_WIDEN_ACCUM,     /* tv_id */
> > > +  ( PROP_cfg | PROP_ssa ),  /* properties_required */
> > > +  0,                       /* properties_provided */
> > > +  0,                       /* properties_destroyed */
> > > +  0,                       /* todo_flags_start */
> > > +  TODO_cleanup_cfg,        /* todo_flags_finish */
> > > +};
> > > +
> > > +class pass_widen_accumulator : public gimple_opt_pass
> > > +{
> > > +public:
> > > +  pass_widen_accumulator (gcc::context *ctxt)
> > > +    : gimple_opt_pass (pass_data_widen_accumulator, ctxt) {}
> > > +
> > > +  bool gate (function *) final override
> > > +  {
> > > +    return flag_tree_widen_accum && !flag_trapv;
> > > +  }
> > > +
> > > +  unsigned int execute (function *) final override;
> > > +};
> > > +
> > > +unsigned int
> > > +pass_widen_accumulator::execute (function *fun)
> > > +{
> > > +  bool changed = false;
> > > +
> > > +  loop_optimizer_init (LOOPS_NORMAL);
> > > +
> > > +  if (number_of_loops (fun) <= 1)
> > > +    {
> > > +      loop_optimizer_finalize ();
> > > +      return 0;
> > > +    }
> > > +
> > > +  for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
> > > +    {
> > > +      for (gphi_iterator gsi = gsi_start_phis (loop->header);
> > > +          !gsi_end_p (gsi); gsi_next (&gsi))
> > > +       {
> > > +         gphi *phi = gsi.phi ();
> > > +         tree result = gimple_phi_result (phi);
> > > +
> > > +         if (virtual_operand_p (result))
> > > +           continue;
> > > +
> > > +         if (!INTEGRAL_TYPE_P (TREE_TYPE (result)))
> > > +           continue;
> > > +
> > > +         if (TYPE_PRECISION (TREE_TYPE (result))
> > > +             >= TYPE_PRECISION (integer_type_node))
> > > +           continue;
> >
> > This seems like it should check against BITS_PER_WORD instead, no?
> >
> > > +
> > > +         if (dump_file && (dump_flags & TDF_DETAILS))
> > > +           {
> > > +             fprintf (dump_file,
> > > +                      "\nExamining narrow PHI (prec=%u) in loop %d:\n",
> > > +                      TYPE_PRECISION (TREE_TYPE (result)), loop->num);
> > > +             print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
> > > +           }
> > > +
> > > +         if (analyze_candidate (phi, loop))
> >
> > (*) this, that is, there's no limit on the number of PHIs we analyze
> > and the chains can cross multiple PHIs if you consider
> >
> >   _1 = PHI (init, _2);
> >   _2 = PHI (init, _3);
> >   _3 = PHI (init, ...);
> >
> > which would make this quadratic and not linear in the number of
> > loop statements.
> >
> > > +           {
> > > +             if (dump_file && (dump_flags & TDF_DETAILS))
> > > +               {
> > > +                 fprintf (dump_file,
> > > +                          "\nCandidate accumulator PHI in loop %d:\n",
> > > +                          loop->num);
> > > +                 print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
> > > +               }
> > > +             if (widen_accumulator (phi, loop))
> > > +               changed = true;
> > > +           }
> > > +       }
> > > +    }
> > > +
> > > +  if (changed)
> > > +    {
> > > +      gsi_commit_edge_inserts ();
> > > +      /* Rewrite into loop-closed SSA so that subsequent passes that
> > > +        expect LCSSA form (e.g. pass_optimize_widening_mul) see
> > > +        correct PHI nodes at loop exits for the new wide names.  */
> > > +      rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
> >
> > It should be relatively easy to maintain LC SSA form directly, also this
> > pass isn't in the loop pipeline, so why bother at all?
> >
> > ISTR Micha has done work on general promotion, so CCing him.
> >
> > Richard.
> >
> > > +    }
> > > +
> > > +  loop_optimizer_finalize ();
> > > +
> > > +  return 0;
> > > +}
> > > +
> > > +} // anonymous namespace
> > > +
> > > +gimple_opt_pass *
> > > +make_pass_widen_accumulator (gcc::context *ctxt)
> > > +{
> > > +  return new pass_widen_accumulator (ctxt);
> > > +}
> > > --
> > > 2.34.1
> > >

Re: [PATCH GCC17-stage1] tree-optimization: Add pass_widen_accumulator to widen narrow loop accumulators

Reply via email to