https://github.com/jdenny-ornl updated 
https://github.com/llvm/llvm-project/pull/157754

>From 75a8df62df2ef7e8c02d7a76120e57e2dd1a1539 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.o...@gmail.com>
Date: Tue, 9 Sep 2025 17:33:38 -0400
Subject: [PATCH] [LoopUnroll] Fix block frequencies when no runtime

This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a runtime,
this patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the
  sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
  `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
  original estimated trip count (e.g., 10) divided by the unroll count
  (e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a runtime and complete unrolling, and there are
two associated tests this patch marks as XFAIL.  They will be
addressed in future patches that should land with this patch.
---
 llvm/lib/Transforms/Utils/LoopUnroll.cpp      | 36 ++++++++--
 .../peel.ll}                                  |  0
 .../branch-weights-freq/unroll-partial.ll     | 68 +++++++++++++++++++
 .../LoopUnroll/runtime-loop-branchweight.ll   |  1 +
 .../LoopUnroll/unroll-heuristics-pgo.ll       |  1 +
 5 files changed, 100 insertions(+), 6 deletions(-)
 rename llvm/test/Transforms/LoopUnroll/{peel-branch-weights-freq.ll => 
branch-weights-freq/peel.ll} (100%)
 create mode 100644 
llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp 
b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 8a6c7789d1372..93c43396c54b6 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -499,9 +499,8 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo 
*LI,
 
   const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
   const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
-  unsigned EstimatedLoopInvocationWeight = 0;
   std::optional<unsigned> OriginalTripCount =
-      llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight);
+      llvm::getLoopEstimatedTripCount(L);
 
   // Effectively "DCE" unrolled iterations that are beyond the max tripcount
   // and will never be executed.
@@ -1130,10 +1129,35 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, 
LoopInfo *LI,
     // We shouldn't try to use `L` anymore.
     L = nullptr;
   } else if (OriginalTripCount) {
-    // Update the trip count. Note that the remainder has already logic
-    // computing it in `UnrollRuntimeLoopRemainder`.
-    setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count,
-                              EstimatedLoopInvocationWeight);
+    // Update metadata for the estimated trip count.
+    //
+    // If ULO.Runtime, UnrollRuntimeLoopRemainder handles branch weights for 
the
+    // remainder loop it creates, and the unrolled loop's branch weights are
+    // adjusted below.  Otherwise, if unrolled loop iterations' latches become
+    // unconditional, branch weights are adjusted above.  Otherwise, the
+    // original loop's branch weights are correct for the unrolled loop, so do
+    // not adjust them.
+    // FIXME: Actually handle such unconditional latches and ULO.Runtime.
+    //
+    // For example, consider what happens if the unroll count is 4 for a loop
+    // with an estimated trip count of 10 when we do not create a remainder 
loop
+    // and all iterations' latches remain conditional.  Each unrolled
+    // iteration's latch still has the same probability of exiting the loop as
+    // it did when in the original loop, and thus it should still have the same
+    // branch weights.  Each unrolled iteration's non-zero probability of
+    // exiting already appropriately reduces the probability of reaching the
+    // remaining iterations just as it did in the original loop.  Trying to 
also
+    // adjust the branch weights of the final unrolled iteration's latch (i.e.,
+    // the backedge for the unrolled loop as a whole) to reflect its new trip
+    // count of 3 will erroneously further reduce its block frequencies.
+    // However, in case an analysis later needs to estimate the trip count of
+    // the unrolled loop as a whole without considering the branch weights for
+    // each unrolled iteration's latch within it, we store the new trip count 
as
+    // separate metadata.
+    unsigned NewTripCount = *OriginalTripCount / ULO.Count;
+    if (!ULO.Runtime && *OriginalTripCount % ULO.Count)
+      NewTripCount += 1;
+    setLoopEstimatedTripCount(L, NewTripCount);
   }
 
   // LoopInfo should not be valid, confirm that.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll 
b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/peel.ll
similarity index 100%
rename from llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
rename to llvm/test/Transforms/LoopUnroll/branch-weights-freq/peel.ll
diff --git 
a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll 
b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll
new file mode 100644
index 0000000000000..cde9d46ee8421
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll
@@ -0,0 +1,68 @@
+; Test branch weight metadata, estimated trip count metadata, and block
+; frequencies after partial loop unrolling without -unroll-runtime.
+
+; RUN: opt < %s -S -passes='print<block-freq>' 2>&1 | \
+; RUN:   FileCheck -check-prefix=CHECK %s
+
+; The -implicit-check-not options make sure that no additional labels or calls
+; to @f show up.
+; RUN: opt < %s -S -passes='loop-unroll,print<block-freq>' \
+; RUN:     -unroll-count=4 2>&1 | \
+; RUN:   FileCheck %s -check-prefix=CHECK-UR \
+; RUN:       -implicit-check-not='{{^( *- )?[^ ;]*:}}' \
+; RUN:       -implicit-check-not='call void @f'
+
+; CHECK: block-frequency-info: test
+; CHECK: do.body: float = 10.0,
+
+; The sum should still be ~10.
+;
+; CHECK-UR: block-frequency-info: test
+; CHECK-UR: - [[ENTRY:.*]]:
+; CHECK-UR: - [[DO_BODY:.*]]: float = 2.9078,
+; CHECK-UR: - [[DO_BODY_1:.*]]: float = 2.617,
+; CHECK-UR: - [[DO_BODY_2:.*]]: float = 2.3553,
+; CHECK-UR: - [[DO_BODY_3:.*]]: float = 2.1198,
+; CHECK-UR: - [[DO_END:.*]]:
+
+declare void @f(i32)
+
+define void @test(i32 %n) {
+; CHECK-UR-LABEL: define void @test(i32 %{{.*}}) {
+;       CHECK-UR: [[ENTRY]]:
+;       CHECK-UR:   br label %[[DO_BODY]]
+;       CHECK-UR: [[DO_BODY]]:
+;       CHECK-UR:   call void @f
+;       CHECK-UR:   br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_1]], 
!prof ![[#PROF:]]
+;       CHECK-UR: [[DO_BODY_1]]:
+;       CHECK-UR:   call void @f
+;       CHECK-UR:   br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_2]], 
!prof ![[#PROF]]
+;       CHECK-UR: [[DO_BODY_2]]:
+;       CHECK-UR:   call void @f
+;       CHECK-UR:   br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_3]], 
!prof ![[#PROF]]
+;       CHECK-UR: [[DO_BODY_3]]:
+;       CHECK-UR:   call void @f
+;       CHECK-UR:   br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY]], 
!prof ![[#PROF]], !llvm.loop ![[#LOOP_UR_LATCH:]]
+;       CHECK-UR: [[DO_END]]:
+;       CHECK-UR:   ret void
+
+entry:
+  br label %do.body
+
+do.body:
+  %i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
+  %inc = add i32 %i, 1
+  call void @f(i32 %i)
+  %c = icmp sge i32 %inc, %n
+  br i1 %c, label %do.end, label %do.body, !prof !0
+
+do.end:
+  ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 9}
+
+; CHECK-UR: ![[#PROF]] = !{!"branch_weights", i32 1, i32 9}
+; CHECK-UR: ![[#LOOP_UR_LATCH]] = distinct !{![[#LOOP_UR_LATCH]], 
![[#LOOP_UR_TC:]], ![[#DISABLE:]]}
+; CHECK-UR: ![[#LOOP_UR_TC]] = !{!"llvm.loop.estimated_trip_count", i32 3}
+; CHECK-UR: ![[#DISABLE]] = !{!"llvm.loop.unroll.disable"}
diff --git a/llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll 
b/llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll
index 26171990a2592..36c0497a002cf 100644
--- a/llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll
+++ b/llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll
@@ -1,4 +1,5 @@
 ; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime=true -unroll-count=4 | 
FileCheck %s
+; XFAIL: *
 
 ;; Check that the remainder loop is properly assigned a branch weight for its 
latch branch.
 ; CHECK-LABEL: @test(
diff --git a/llvm/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll 
b/llvm/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
index 611ee5fb5807e..d59a0298b3106 100644
--- a/llvm/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
+++ b/llvm/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
@@ -1,4 +1,5 @@
 ; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime -unroll-threshold=40 
-unroll-max-percent-threshold-boost=100 | FileCheck %s
+; XFAIL: *
 
 @known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 
0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16
 

_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to