[gcc r17-581] [PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

Jeff Law via Gcc-cvs Mon, 18 May 2026 15:00:51 -0700

https://gcc.gnu.org/g:5e33bbf6faefa4f848e47e2a3bfec1a971d19cf2


commit r17-581-g5e33bbf6faefa4f848e47e2a3bfec1a971d19cf2
Author: Xin Wang <[email protected]>
Date:   Mon May 18 16:00:15 2026 -0600

    [PATCH v2] tree-optimization: Fix profile update in loop splitting 
(initial_true=false)
    
    When split_loop does iteration space splitting, split_at_bb_p may
    swap the guard condition so that operand 0 is always the loop IV
    and operand 1 is the invariant.  For example, "t < i" (LT_EXPR)
    becomes "i > t" (GT_EXPR).  This can cause initial_true to be
    false, meaning loop1 handles iterations where the guard is false
    and loop2 handles iterations where the guard is true.
    
    The profile update code used true_edge->probability for loop1 and
    its inverse for loop2.  That is correct only when loop1 keeps the
    true branch.  When initial_true is false, loop1 keeps the false
    branch and loop2 keeps the true branch, so all profile quantities
    must follow those semantic edges instead of the raw true/false edge
    names.
    
    Derive loop1_edge and loop2_edge once from initial_true and use them
    consistently for loop_version's then probability, the split-loop BB
    count scaling, and the iteration estimate scaling.  Also make
    fix_loop_bb_probability take loop1/loop2 edges explicitly, rather
    than assuming its arguments are true/false edges.
    
    The bug caused BB counts in the split loops to be swapped when
    initial_true is false: the loop body whose guard is forced false
    (loop1, executing fewer iterations) would get the higher profile
    count, and vice versa.  It could also leave the precondition edge
    probabilities and iteration estimates based on the wrong split edge.
    
    gcc/ChangeLog:
    
            * tree-ssa-loop-split.cc (fix_loop_bb_probability): Rename
            parameters from true_edge/false_edge to loop1_edge/loop2_edge
            and scale both loops directly from their semantic edge
            probabilities.
            (split_loop): Derive loop1_edge and loop2_edge from initial_true.
            Use them for loop1_prob, fix_loop_bb_probability, and iteration
            estimate scaling.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-prof/loop-split-4.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c | 38 +++++++++++++++++++
 gcc/tree-ssa-loop-split.cc                    | 54 +++++++++++++++++----------
 2 files changed, 72 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c 
b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
new file mode 100644
index 000000000000..1c67e30f4dec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
@@ -0,0 +1,38 @@
+/* PR tree-optimization/XXXXX */
+/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
+
+volatile int sink;
+
+__attribute__((noinline)) int
+helper (int a, int b)
+{
+  return a + b;
+}
+
+int
+main (void)
+{
+  int n = 100, t = 3, total = 0;
+  /* With t=3, n=100 the guard "t < i" is false for i=0..3 (4 iterations,
+     empty else) and true for i=4..99 (96 iterations, calls helper).
+     split_at_bb_p swaps "t < i" to "i > t" (GT_EXPR), giving
+     initial_true = false.  Loop1 (cold, 4 iterations) handles the false
+     case, loop2 (hot, 96 iterations) handles the true case.
+     Without the fix loop1 was scaled by true_edge->probability (96%),
+     inverting the counts.  */
+  for (int i = 0; i < n; i++)
+    if (t < i)
+      total += helper (i, t);
+  sink = total;
+  return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } 
} */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" 
} } */
+/* { dg-final-use-not-autofdo { scan-tree-dump "Split loop: initial_true 
false" "lsplit" } } */
+/* With the fix loop1 (cold, 4 iterations) count ~4, loop2 (hot, 96
+   iterations) count ~96.  Without the fix the counts are inverted.
+   Check loop1 is a single-digit and loop2 is 90+.  */
+/* { dg-final-use-not-autofdo { scan-tree-dump "loop1 count \[0-9\], loop2 
count 9\[0-9\]" "lsplit" } } */
+/* The skip edge from loop1 to loop2 should use loop1_prob.invert (),
+   where loop1_prob corresponds to the false edge when initial_true is false.  
*/
+/* { dg-final-use-not-autofdo { scan-tree-dump "goto <bb \[0-9\]+>; .99\\." 
"lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index 722383a420ac..22a9656f0a56 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -521,37 +521,38 @@ compute_new_first_bound (gimple_seq *stmts, class 
tree_niter_desc *niter,
 }
 
 /* Fix the two loop's bb count after split based on the split edge probability,
-   don't adjust the bbs dominated by true branches of that loop to avoid
+   don't adjust the bbs dominated by the branch kept in that loop to avoid
    dropping 1s down.  */
 static void
-fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
-                        edge false_edge)
+fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge loop1_edge,
+                        edge loop2_edge)
 {
-  /* Proportion first loop's bb counts except those dominated by true
-     branch to avoid drop 1s down.  */
+  /* Proportion first loop's bb counts except those dominated by the
+     branch that stays true/false in the first loop, to avoid dropping
+     1s down.  */
   basic_block *bbs1, *bbs2;
   bbs1 = get_loop_body (loop1);
   unsigned j;
   for (j = 0; j < loop1->num_nodes; j++)
     if (bbs1[j] == loop1->latch
-       /* Watch for case where the true conditional is empty.  */
-       || !single_pred_p (true_edge->dest)
-       || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
+       /* Watch for case where the kept conditional arm is empty.  */
+       || !single_pred_p (loop1_edge->dest)
+       || !dominated_by_p (CDI_DOMINATORS, bbs1[j], loop1_edge->dest))
       bbs1[j]->count
-       = bbs1[j]->count.apply_probability (true_edge->probability);
+       = bbs1[j]->count.apply_probability (loop1_edge->probability);
   free (bbs1);
 
-  /* Proportion second loop's bb counts except those dominated by false
-     branch to avoid drop 1s down.  */
-  basic_block bbi_copy = get_bb_copy (false_edge->dest);
+  /* Proportion second loop's bb counts except those dominated by the
+     opposite branch, which is kept in the second loop.  */
+  basic_block bbi_copy = get_bb_copy (loop2_edge->dest);
   bbs2 = get_loop_body (loop2);
   for (j = 0; j < loop2->num_nodes; j++)
     if (bbs2[j] == loop2->latch
-       /* Watch for case where the flase conditional is empty.  */
+       /* Watch for case where the kept conditional arm is empty.  */
        || !single_pred_p (bbi_copy)
        || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
       bbs2[j]->count
-       = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
+       = bbs2[j]->count.apply_probability (loop2_edge->probability);
   free (bbs2);
 }
 
@@ -675,6 +676,8 @@ split_loop (class loop *loop1)
 
        edge true_edge, false_edge;
        extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
+       edge loop1_edge = initial_true ? true_edge : false_edge;
+       edge loop2_edge = initial_true ? false_edge : true_edge;
 
        /* Now version the loop, placing loop2 after loop1 connecting
           them, and fix up SSA form for that.  */
@@ -683,7 +686,7 @@ split_loop (class loop *loop1)
 
        profile_probability loop1_prob
          = integer_onep (cond) ? profile_probability::always ()
-                               : true_edge->probability;
+                               : loop1_edge->probability;
        /* TODO: It is commonly a case that we know that both loops will be
           entered.  very_likely below is the probability that second loop will
           be entered given by connect_loops.  We should work out the common
@@ -709,7 +712,16 @@ split_loop (class loop *loop1)
                (loop_preheader_edge (loop2)->src)->probability
                        = loop1_prob.invert ();
 
-       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
+       fix_loop_bb_probability (loop1, loop2, loop1_edge, loop2_edge);
+
+       if (dump_file && (dump_flags & TDF_DETAILS))
+         fprintf (dump_file,
+                  ";; Split loop: initial_true %s, "
+                  "loop1 count %" PRId64 ", loop2 count %" PRId64 "\n",
+                  initial_true ? "true" : "false",
+                  (int64_t) loop1->header->count.to_gcov_type (),
+                  (int64_t) loop2->header->count.to_gcov_type ());
+
        /* If conditional we split on has reliable profilea nd both
           preconditionals of loop1 and loop2 are constant true, we can
           only redistribute the iteration counts to the split loops.
@@ -727,10 +739,12 @@ split_loop (class loop *loop1)
        if (loop1->any_estimate
            && wi::fits_shwi_p (loop1->nb_iterations_estimate))
          {
-           sreal scale = true_edge->probability.reliable_p ()
-                         ? true_edge->probability.to_sreal () : (sreal)1;
-           sreal scale2 = false_edge->probability.reliable_p ()
-                         ? false_edge->probability.to_sreal () : (sreal)1;
+           sreal scale
+             = loop1_edge->probability.reliable_p ()
+               ? loop1_edge->probability.to_sreal () : (sreal)1;
+           sreal scale2
+             = loop2_edge->probability.reliable_p ()
+               ? loop2_edge->probability.to_sreal () : (sreal)1;
            sreal div1 = loop1_prob.initialized_p ()
                         ? loop1_prob.to_sreal () : (sreal)1/(sreal)2;
            /* +1 to get header interations rather than latch iterations and 
then

[gcc r17-581] [PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

Reply via email to