[PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

Xin Wang Tue, 12 May 2026 05:46:17 -0700

When split_loop does iteration space splitting, split_at_bb_p may
swap the guard condition so that operand 0 is always the loop IV
and operand 1 is the invariant.  For example, "t < i" (LT_EXPR)
becomes "i > t" (GT_EXPR).  This can cause initial_true to be
false, meaning loop1 handles iterations where the guard is false
and loop2 handles iterations where the guard is true.


The profile update code used true_edge->probability for loop1 and
its inverse for loop2.  That is correct only when loop1 keeps the
true branch.  When initial_true is false, loop1 keeps the false
branch and loop2 keeps the true branch, so all profile quantities
must follow those semantic edges instead of the raw true/false edge
names.

Derive loop1_edge and loop2_edge once from initial_true and use them
consistently for loop_version's then probability, the split-loop BB
count scaling, and the iteration estimate scaling.  Also make
fix_loop_bb_probability take loop1/loop2 edges explicitly, rather
than assuming its arguments are true/false edges.

The bug caused BB counts in the split loops to be swapped when
initial_true is false: the loop body whose guard is forced false
(loop1, executing fewer iterations) would get the higher profile
count, and vice versa.  It could also leave the precondition edge
probabilities and iteration estimates based on the wrong split edge.

gcc/ChangeLog:

        * tree-ssa-loop-split.cc (fix_loop_bb_probability): Rename
        parameters from true_edge/false_edge to loop1_edge/loop2_edge
        and scale both loops directly from their semantic edge
        probabilities.
        (split_loop): Derive loop1_edge and loop2_edge from initial_true.
        Use them for loop1_prob, fix_loop_bb_probability, and iteration
        estimate scaling.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-prof/loop-split-4.c: New test.

Signed-off-by: Xin Wang <[email protected]>
---
Changes in v2:
- loop1_prob had the same true/false edge mismatch when initial_true
  is false.
- Derive loop1_edge and loop2_edge once and use them consistently for
  loop1_prob, fix_loop_bb_probability and iteration estimate scaling.
- Rename fix_loop_bb_probability parameters to reflect loop1/loop2
  semantics rather than raw true/false edge semantics.
- Extend the testcase to check the initial_true=false path, the corrected
  loop counts, and the high-probability skip edge to loop2.

 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c | 38 +++++++++++++
 gcc/tree-ssa-loop-split.cc                    | 54 ++++++++++++-------
 2 files changed, 72 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c 
b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
new file mode 100644
index 00000000000..1c67e30f4de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
@@ -0,0 +1,38 @@
+/* PR tree-optimization/XXXXX */
+/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
+
+volatile int sink;
+
+__attribute__((noinline)) int
+helper (int a, int b)
+{
+  return a + b;
+}
+
+int
+main (void)
+{
+  int n = 100, t = 3, total = 0;
+  /* With t=3, n=100 the guard "t < i" is false for i=0..3 (4 iterations,
+     empty else) and true for i=4..99 (96 iterations, calls helper).
+     split_at_bb_p swaps "t < i" to "i > t" (GT_EXPR), giving
+     initial_true = false.  Loop1 (cold, 4 iterations) handles the false
+     case, loop2 (hot, 96 iterations) handles the true case.
+     Without the fix loop1 was scaled by true_edge->probability (96%),
+     inverting the counts.  */
+  for (int i = 0; i < n; i++)
+    if (t < i)
+      total += helper (i, t);
+  sink = total;
+  return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } 
} */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" 
} } */
+/* { dg-final-use-not-autofdo { scan-tree-dump "Split loop: initial_true 
false" "lsplit" } } */
+/* With the fix loop1 (cold, 4 iterations) count ~4, loop2 (hot, 96
+   iterations) count ~96.  Without the fix the counts are inverted.
+   Check loop1 is a single-digit and loop2 is 90+.  */
+/* { dg-final-use-not-autofdo { scan-tree-dump "loop1 count \[0-9\], loop2 
count 9\[0-9\]" "lsplit" } } */
+/* The skip edge from loop1 to loop2 should use loop1_prob.invert (),
+   where loop1_prob corresponds to the false edge when initial_true is false.  
*/
+/* { dg-final-use-not-autofdo { scan-tree-dump "goto <bb \[0-9\]+>; .99\\." 
"lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index ba6cc45d7f0..477f09bf73d 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -524,37 +524,38 @@ compute_new_first_bound (gimple_seq *stmts, class 
tree_niter_desc *niter,
 }
 
 /* Fix the two loop's bb count after split based on the split edge probability,
-   don't adjust the bbs dominated by true branches of that loop to avoid
+   don't adjust the bbs dominated by the branch kept in that loop to avoid
    dropping 1s down.  */
 static void
-fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
-                        edge false_edge)
+fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge loop1_edge,
+                        edge loop2_edge)
 {
-  /* Proportion first loop's bb counts except those dominated by true
-     branch to avoid drop 1s down.  */
+  /* Proportion first loop's bb counts except those dominated by the
+     branch that stays true/false in the first loop, to avoid dropping
+     1s down.  */
   basic_block *bbs1, *bbs2;
   bbs1 = get_loop_body (loop1);
   unsigned j;
   for (j = 0; j < loop1->num_nodes; j++)
     if (bbs1[j] == loop1->latch
-       /* Watch for case where the true conditional is empty.  */
-       || !single_pred_p (true_edge->dest)
-       || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
+       /* Watch for case where the kept conditional arm is empty.  */
+       || !single_pred_p (loop1_edge->dest)
+       || !dominated_by_p (CDI_DOMINATORS, bbs1[j], loop1_edge->dest))
       bbs1[j]->count
-       = bbs1[j]->count.apply_probability (true_edge->probability);
+       = bbs1[j]->count.apply_probability (loop1_edge->probability);
   free (bbs1);
 
-  /* Proportion second loop's bb counts except those dominated by false
-     branch to avoid drop 1s down.  */
-  basic_block bbi_copy = get_bb_copy (false_edge->dest);
+  /* Proportion second loop's bb counts except those dominated by the
+     opposite branch, which is kept in the second loop.  */
+  basic_block bbi_copy = get_bb_copy (loop2_edge->dest);
   bbs2 = get_loop_body (loop2);
   for (j = 0; j < loop2->num_nodes; j++)
     if (bbs2[j] == loop2->latch
-       /* Watch for case where the flase conditional is empty.  */
+       /* Watch for case where the kept conditional arm is empty.  */
        || !single_pred_p (bbi_copy)
        || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
       bbs2[j]->count
-       = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
+       = bbs2[j]->count.apply_probability (loop2_edge->probability);
   free (bbs2);
 }
 
@@ -678,6 +679,8 @@ split_loop (class loop *loop1)
 
        edge true_edge, false_edge;
        extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
+       edge loop1_edge = initial_true ? true_edge : false_edge;
+       edge loop2_edge = initial_true ? false_edge : true_edge;
 
        /* Now version the loop, placing loop2 after loop1 connecting
           them, and fix up SSA form for that.  */
@@ -686,7 +689,7 @@ split_loop (class loop *loop1)
 
        profile_probability loop1_prob
          = integer_onep (cond) ? profile_probability::always ()
-                               : true_edge->probability;
+                               : loop1_edge->probability;
        /* TODO: It is commonly a case that we know that both loops will be
           entered.  very_likely below is the probability that second loop will
           be entered given by connect_loops.  We should work out the common
@@ -712,7 +715,16 @@ split_loop (class loop *loop1)
                (loop_preheader_edge (loop2)->src)->probability
                        = loop1_prob.invert ();
 
-       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
+       fix_loop_bb_probability (loop1, loop2, loop1_edge, loop2_edge);
+
+       if (dump_file && (dump_flags & TDF_DETAILS))
+         fprintf (dump_file,
+                  ";; Split loop: initial_true %s, "
+                  "loop1 count %" PRId64 ", loop2 count %" PRId64 "\n",
+                  initial_true ? "true" : "false",
+                  (int64_t) loop1->header->count.to_gcov_type (),
+                  (int64_t) loop2->header->count.to_gcov_type ());
+
        /* If conditional we split on has reliable profilea nd both
           preconditionals of loop1 and loop2 are constant true, we can
           only redistribute the iteration counts to the split loops.
@@ -730,10 +742,12 @@ split_loop (class loop *loop1)
        if (loop1->any_estimate
            && wi::fits_shwi_p (loop1->nb_iterations_estimate))
          {
-           sreal scale = true_edge->probability.reliable_p ()
-                         ? true_edge->probability.to_sreal () : (sreal)1;
-           sreal scale2 = false_edge->probability.reliable_p ()
-                         ? false_edge->probability.to_sreal () : (sreal)1;
+           sreal scale
+             = loop1_edge->probability.reliable_p ()
+               ? loop1_edge->probability.to_sreal () : (sreal)1;
+           sreal scale2
+             = loop2_edge->probability.reliable_p ()
+               ? loop2_edge->probability.to_sreal () : (sreal)1;
            sreal div1 = loop1_prob.initialized_p ()
                         ? loop1_prob.to_sreal () : (sreal)1/(sreal)2;
            /* +1 to get header interations rather than latch iterations and 
then
-- 
2.33.0

[PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

Reply via email to