On Thu, 14 May 2026, liuhongt wrote:

> Before this patch, init_loop_unswitch_info always walked outwards through
> a singly-nested loop chain (up to max-unswitch-depth) and selected the
> outermost loop as the unswitching candidate.  That heuristic assumes
> unswitching the outer loop is always preferable, but it can cause large
> code growth when the outer loop body is much bigger than the inner one:
> the entire outer body gets duplicated for each guarded version, even
> though the unswitched condition only matters inside the inner loop.
> 
> Switch to a guarded ascent: only keep walking out when the outer
> candidate is not much larger than the current inner loop, or when its
> size still fits within max-unswitch-insns.  Concretely, stop ascending
> when the outer is both more than twice the size of the inner and larger
> than max-unswitch-insns.  This preserves outer-loop unswitching for the
> common case of small outer loops, while preventing the pathological
> duplication when the outer body dominates the nest.
> 
> This patch fixes a regression in SPEC2026 772.marian_r introduced by 
> r13-3875-g9e11ceef165bc0
> 
> In the benchmark, the inner loop contains two predicates. After the guilty 
> commit, only one of them is hoisted out via unswitching; the second fails to 
> be hoisted because it exceeds param_max_unswitch_insns, and causes miss 
> vectorization. The outermost loop is very large — more than twice the size of 
> its inner loop — which makes unswitching it especially costly.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>       * tree-ssa-loop-unswitch.cc (estimate_loop_insns): New function.
>       (init_loop_unswitch_info): Do not select an outer loop for
>       unswitching when it is both more than twice the size of the inner
>       loop and larger than max-unswitch-insns.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/loop-unswitch-19.c: New test.
> ---
>  gcc/testsuite/gcc.dg/loop-unswitch-19.c | 42 +++++++++++++++++++++++
>  gcc/tree-ssa-loop-unswitch.cc           | 44 ++++++++++++++++++++++---
>  2 files changed, 82 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-19.c
> 
> diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-19.c 
> b/gcc/testsuite/gcc.dg/loop-unswitch-19.c
> new file mode 100644
> index 00000000000..24d450bd7a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-unswitch-19.c
> @@ -0,0 +1,42 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-unswitch-optimized" } */
> +
> +extern volatile int sink;
> +
> +void
> +foo (int flag, int n, int m)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      /* Pad the outer loop to make it noticeably larger than the inner.  */
> +      sink += i;
> +      sink += 2;
> +      sink += 3;
> +      sink += 4;
> +      sink += 5;
> +      sink += 6;
> +      sink += 7;
> +      sink += 8;
> +      sink += 9;
> +      sink += 10;
> +      sink += 11;
> +      sink += 12;
> +      sink += 13;
> +      sink += 14;
> +      sink += 15;
> +      sink += 16;
> +      sink += 17;
> +      sink += 18;
> +      sink += 19;
> +      sink += 20;
> +
> +      for (int j = 0; j < m; ++j)
> +        if (flag)
> +          sink += j;
> +    }
> +}
> +
> +/* We should still unswitch, but stay on the inner loop because the outer
> +   body is much larger than the inner and exceeds the unswitch size cap.  */
> +/* { dg-final { scan-tree-dump "unswitching loop" "unswitch" } } */
> +/* { dg-final { scan-tree-dump-not "unswitching outer loop" "unswitch" } } */
> diff --git a/gcc/tree-ssa-loop-unswitch.cc b/gcc/tree-ssa-loop-unswitch.cc
> index 96680bc64ea..06763b13911 100644
> --- a/gcc/tree-ssa-loop-unswitch.cc
> +++ b/gcc/tree-ssa-loop-unswitch.cc
> @@ -250,6 +250,21 @@ set_predicates_for_bb (basic_block bb, 
> vec<unswitch_predicate *> predicates)
>    bb_predicates->safe_push (predicates);
>  }
>  
> +/* Estimate number of instructions in LOOP using eni_size_weights.  */
> +
> +static unsigned
> +estimate_loop_insns (class loop *loop)
> +{
> +  unsigned insns = 0;
> +  basic_block *body = get_loop_body (loop);
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]);
> +      !gsi_end_p (gsi); gsi_next (&gsi))
> +      insns += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> +  free (body);
> +  return insns;
> +}
> +
>  /* Initialize LOOP information reused during the unswitching pass.
>     Return total number of instructions in the loop.  Adjusts LOOP to
>     the outermost loop all candidates are invariant in.  */
> @@ -266,9 +281,31 @@ init_loop_unswitch_info (class loop *&loop, 
> unswitch_predicate *&hottest,
>    class loop *outer_loop = loop;
>    unsigned max_depth = param_max_unswitch_depth;
>    while (loop_outer (outer_loop)->num != 0
> -      && !loop_outer (outer_loop)->inner->next
> -      && --max_depth != 0)
> -    outer_loop = loop_outer (outer_loop);
> +      && !loop_outer (outer_loop)->inner->next)
> +    {
> +      if (--max_depth == 0)
> +     break;
> +
> +      class loop *candidate = loop_outer (outer_loop);
> +      unsigned inner_size = estimate_loop_insns (outer_loop);
> +      unsigned candidate_size = estimate_loop_insns (candidate);
> +
> +      /* Avoid selecting large outer loops when they are also much bigger
> +      than the current inner loop.  */
> +      if (candidate_size > 2 * inner_size
> +       && candidate_size > (unsigned) param_max_unswitch_insns)

I was thinking on only testing against param_max_unswitch_insns?
Given we're only unswitching on innermost predicates but we
try to accurately cost remaining stmts on each side we could
improve the estimate by considering the outer loop to be
duplicated but not the innermost one, so check

       candidate_size - innermost_loop_size > (unsigned) 
param_max_unswitch_insns

?  That said, your estimate_loop_insns is quadratic in loop depth
(but that's bound to param_max_unswitch_depth and with this new
size check, so possibly OK ...).  In your patch you can at least
avoid one estimate_loop_insns () call, with my proposal you'd hoist
innermost_loop_size and keep one.

There is of course the possibility that the outer loops contain
the same predicates (and we'd properly cost that I think), not
sure how likely and how important that is to handle (in extreme
there'll be no actual code duplication but the loop control).
Thus instead of restricting outer loop here we could handle
this in tree_unswitch_single_loop and handle the case
when predicate == 0 by recursing to loop->inner (possibly
only when we blowed budged - we'll have to restore the
original budget then, of course).

Richard.

> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_NOTE, find_loop_location (loop),
> +                          "Not unswitching outer loop, size %u > 2x inner "
> +                          "%u and exceeds max-unswitch-insns %u\n",
> +                          candidate_size, inner_size,
> +                          (unsigned) param_max_unswitch_insns);
> +       break;
> +     }
> +
> +      outer_loop = candidate;
> +    }
>    hottest = NULL;
>    hottest_bb = NULL;
>    /* Find all unswitching candidates in the innermost loop.  */
> @@ -1678,4 +1715,3 @@ make_pass_tree_unswitch (gcc::context *ctxt)
>    return new pass_tree_unswitch (ctxt);
>  }
>  
> -
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to