Re: [PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

Richard Biener Fri, 15 May 2026 01:04:36 -0700

On Tue, 12 May 2026, Xin Wang wrote:

> When split_loop does iteration space splitting, split_at_bb_p may
> swap the guard condition so that operand 0 is always the loop IV
> and operand 1 is the invariant.  For example, "t < i" (LT_EXPR)
> becomes "i > t" (GT_EXPR).  This can cause initial_true to be
> false, meaning loop1 handles iterations where the guard is false
> and loop2 handles iterations where the guard is true.
> 
> The profile update code used true_edge->probability for loop1 and
> its inverse for loop2.  That is correct only when loop1 keeps the
> true branch.  When initial_true is false, loop1 keeps the false
> branch and loop2 keeps the true branch, so all profile quantities
> must follow those semantic edges instead of the raw true/false edge
> names.
> 
> Derive loop1_edge and loop2_edge once from initial_true and use them
> consistently for loop_version's then probability, the split-loop BB
> count scaling, and the iteration estimate scaling.  Also make
> fix_loop_bb_probability take loop1/loop2 edges explicitly, rather
> than assuming its arguments are true/false edges.
> 
> The bug caused BB counts in the split loops to be swapped when
> initial_true is false: the loop body whose guard is forced false
> (loop1, executing fewer iterations) would get the higher profile
> count, and vice versa.  It could also leave the precondition edge
> probabilities and iteration estimates based on the wrong split edge.


OK - much better to follow after the renaming.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>       * tree-ssa-loop-split.cc (fix_loop_bb_probability): Rename
>       parameters from true_edge/false_edge to loop1_edge/loop2_edge
>       and scale both loops directly from their semantic edge
>       probabilities.
>       (split_loop): Derive loop1_edge and loop2_edge from initial_true.
>       Use them for loop1_prob, fix_loop_bb_probability, and iteration
>       estimate scaling.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/tree-prof/loop-split-4.c: New test.
> 
> Signed-off-by: Xin Wang <[email protected]>
> ---
> Changes in v2:
> - loop1_prob had the same true/false edge mismatch when initial_true
>   is false.
> - Derive loop1_edge and loop2_edge once and use them consistently for
>   loop1_prob, fix_loop_bb_probability and iteration estimate scaling.
> - Rename fix_loop_bb_probability parameters to reflect loop1/loop2
>   semantics rather than raw true/false edge semantics.
> - Extend the testcase to check the initial_true=false path, the corrected
>   loop counts, and the high-probability skip edge to loop2.
> 
>  gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c | 38 +++++++++++++
>  gcc/tree-ssa-loop-split.cc                    | 54 ++++++++++++-------
>  2 files changed, 72 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c 
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
> new file mode 100644
> index 00000000000..1c67e30f4de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
> @@ -0,0 +1,38 @@
> +/* PR tree-optimization/XXXXX */
> +/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
> +
> +volatile int sink;
> +
> +__attribute__((noinline)) int
> +helper (int a, int b)
> +{
> +  return a + b;
> +}
> +
> +int
> +main (void)
> +{
> +  int n = 100, t = 3, total = 0;
> +  /* With t=3, n=100 the guard "t < i" is false for i=0..3 (4 iterations,
> +     empty else) and true for i=4..99 (96 iterations, calls helper).
> +     split_at_bb_p swaps "t < i" to "i > t" (GT_EXPR), giving
> +     initial_true = false.  Loop1 (cold, 4 iterations) handles the false
> +     case, loop2 (hot, 96 iterations) handles the true case.
> +     Without the fix loop1 was scaled by true_edge->probability (96%),
> +     inverting the counts.  */
> +  for (int i = 0; i < n; i++)
> +    if (t < i)
> +      total += helper (i, t);
> +  sink = total;
> +  return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" 
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump "Split loop: initial_true 
> false" "lsplit" } } */
> +/* With the fix loop1 (cold, 4 iterations) count ~4, loop2 (hot, 96
> +   iterations) count ~96.  Without the fix the counts are inverted.
> +   Check loop1 is a single-digit and loop2 is 90+.  */
> +/* { dg-final-use-not-autofdo { scan-tree-dump "loop1 count \[0-9\], loop2 
> count 9\[0-9\]" "lsplit" } } */
> +/* The skip edge from loop1 to loop2 should use loop1_prob.invert (),
> +   where loop1_prob corresponds to the false edge when initial_true is 
> false.  */
> +/* { dg-final-use-not-autofdo { scan-tree-dump "goto <bb \[0-9\]+>; .99\\." 
> "lsplit" } } */
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index ba6cc45d7f0..477f09bf73d 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -524,37 +524,38 @@ compute_new_first_bound (gimple_seq *stmts, class 
> tree_niter_desc *niter,
>  }
>  
>  /* Fix the two loop's bb count after split based on the split edge 
> probability,
> -   don't adjust the bbs dominated by true branches of that loop to avoid
> +   don't adjust the bbs dominated by the branch kept in that loop to avoid
>     dropping 1s down.  */
>  static void
> -fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge 
> true_edge,
> -                      edge false_edge)
> +fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge 
> loop1_edge,
> +                      edge loop2_edge)
>  {
> -  /* Proportion first loop's bb counts except those dominated by true
> -     branch to avoid drop 1s down.  */
> +  /* Proportion first loop's bb counts except those dominated by the
> +     branch that stays true/false in the first loop, to avoid dropping
> +     1s down.  */
>    basic_block *bbs1, *bbs2;
>    bbs1 = get_loop_body (loop1);
>    unsigned j;
>    for (j = 0; j < loop1->num_nodes; j++)
>      if (bbs1[j] == loop1->latch
> -     /* Watch for case where the true conditional is empty.  */
> -     || !single_pred_p (true_edge->dest)
> -     || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
> +     /* Watch for case where the kept conditional arm is empty.  */
> +     || !single_pred_p (loop1_edge->dest)
> +     || !dominated_by_p (CDI_DOMINATORS, bbs1[j], loop1_edge->dest))
>        bbs1[j]->count
> -     = bbs1[j]->count.apply_probability (true_edge->probability);
> +     = bbs1[j]->count.apply_probability (loop1_edge->probability);
>    free (bbs1);
>  
> -  /* Proportion second loop's bb counts except those dominated by false
> -     branch to avoid drop 1s down.  */
> -  basic_block bbi_copy = get_bb_copy (false_edge->dest);
> +  /* Proportion second loop's bb counts except those dominated by the
> +     opposite branch, which is kept in the second loop.  */
> +  basic_block bbi_copy = get_bb_copy (loop2_edge->dest);
>    bbs2 = get_loop_body (loop2);
>    for (j = 0; j < loop2->num_nodes; j++)
>      if (bbs2[j] == loop2->latch
> -     /* Watch for case where the flase conditional is empty.  */
> +     /* Watch for case where the kept conditional arm is empty.  */
>       || !single_pred_p (bbi_copy)
>       || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
>        bbs2[j]->count
> -     = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
> +     = bbs2[j]->count.apply_probability (loop2_edge->probability);
>    free (bbs2);
>  }
>  
> @@ -678,6 +679,8 @@ split_loop (class loop *loop1)
>  
>       edge true_edge, false_edge;
>       extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
> +     edge loop1_edge = initial_true ? true_edge : false_edge;
> +     edge loop2_edge = initial_true ? false_edge : true_edge;
>  
>       /* Now version the loop, placing loop2 after loop1 connecting
>          them, and fix up SSA form for that.  */
> @@ -686,7 +689,7 @@ split_loop (class loop *loop1)
>  
>       profile_probability loop1_prob
>         = integer_onep (cond) ? profile_probability::always ()
> -                             : true_edge->probability;
> +                             : loop1_edge->probability;
>       /* TODO: It is commonly a case that we know that both loops will be
>          entered.  very_likely below is the probability that second loop will
>          be entered given by connect_loops.  We should work out the common
> @@ -712,7 +715,16 @@ split_loop (class loop *loop1)
>               (loop_preheader_edge (loop2)->src)->probability
>                       = loop1_prob.invert ();
>  
> -     fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> +     fix_loop_bb_probability (loop1, loop2, loop1_edge, loop2_edge);
> +
> +     if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file,
> +                ";; Split loop: initial_true %s, "
> +                "loop1 count %" PRId64 ", loop2 count %" PRId64 "\n",
> +                initial_true ? "true" : "false",
> +                (int64_t) loop1->header->count.to_gcov_type (),
> +                (int64_t) loop2->header->count.to_gcov_type ());
> +
>       /* If conditional we split on has reliable profilea nd both
>          preconditionals of loop1 and loop2 are constant true, we can
>          only redistribute the iteration counts to the split loops.
> @@ -730,10 +742,12 @@ split_loop (class loop *loop1)
>       if (loop1->any_estimate
>           && wi::fits_shwi_p (loop1->nb_iterations_estimate))
>         {
> -         sreal scale = true_edge->probability.reliable_p ()
> -                       ? true_edge->probability.to_sreal () : (sreal)1;
> -         sreal scale2 = false_edge->probability.reliable_p ()
> -                       ? false_edge->probability.to_sreal () : (sreal)1;
> +         sreal scale
> +           = loop1_edge->probability.reliable_p ()
> +             ? loop1_edge->probability.to_sreal () : (sreal)1;
> +         sreal scale2
> +           = loop2_edge->probability.reliable_p ()
> +             ? loop2_edge->probability.to_sreal () : (sreal)1;
>           sreal div1 = loop1_prob.initialized_p ()
>                        ? loop1_prob.to_sreal () : (sreal)1/(sreal)2;
>           /* +1 to get header interations rather than latch iterations and 
> then
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

Reply via email to