[llvm-branch-commits] [llvm] [LoopUnroll] Fix freqs for unconditional latches: N>2, uniform (PR #182405)

Joel E. Denny via llvm-branch-commits Thu, 19 Feb 2026 15:55:31 -0800

https://github.com/jdenny-ornl created 
https://github.com/llvm/llvm-project/pull/182405


This patch introduces the command-line option `-unroll-uniform-weights`.  When 
computing probabilities for the remaining N conditional latches in the unrolled 
loop after converting some iterations' latches to unconditional, LoopUnroll now 
supports the following three strategies:

- A. If N <= 2, use a simple formula to compute a single uniform probability 
across those latches.
- B. Otherwise, if `-unroll-uniform-weights` is not specified, apply the 
original loop's probability to all N latches and then, as needed, adjust as few 
of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform probability 
across all N latches.  This patch implements this strategy.

An issue with C is that it could impact compiler performance, so this patch 
makes it opt-in.  Its appeal over B is that it treats all latches the same 
given that we have no evidence showing that any latch should have a higher or 
lower probability than any other.  A has neither problem, but I do not know how 
to apply it for N > 2.  More experience or feedback from others might determine 
that some strategies are not worthwhile to maintain.

>From d3054a4e08388bda0bb58a0557b871dd6c6cb9df Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <[email protected]>
Date: Thu, 19 Feb 2026 17:53:59 -0500
Subject: [PATCH] [LoopUnroll] Fix freqs for unconditional latches: N>2,
 uniform

This patch introduces the command-line option
`-unroll-uniform-weights`.  When computing probabilities for the
remaining N conditional latches in the unrolled loop after converting
some iterations' latches to unconditional, LoopUnroll now supports the
following three strategies:

- A. If N <= 2, use a simple formula to compute a single uniform
  probability across those latches.
- B. Otherwise, if `-unroll-uniform-weights` is not specified, apply
  the original loop's probability to all N latches and then, as
  needed, adjust as few of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform
  probability across all N latches.  This patch implements this
  strategy.

An issue with C is that it could impact compiler performance, so this
patch makes it opt-in.  Its appeal over B is that it treats all
latches the same given that we have no evidence showing that any latch
should have a higher or lower probability than any other.  A has
neither problem, but I do not know how to apply it for N > 2.  More
experience or feedback from others might determine that some
strategies are not worthwhile to maintain.
---
 llvm/lib/Transforms/Utils/LoopUnroll.cpp      | 112 +++++++-
 .../branch-weights-freq/unroll-complete.ll    | 267 +++++++++++++-----
 .../unroll-partial-unconditional-latch.ll     |  70 +++--
 3 files changed, 336 insertions(+), 113 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp 
b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 404e254c8a66f..1028b0f026ea3 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -89,6 +89,11 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", 
cl::init(false), cl::Hidden,
                     cl::desc("Allow runtime unrolled loops to be unrolled "
                              "with epilog instead of prolog."));
 
+static cl::opt<bool> UnrollUniformWeights(
+    "unroll-uniform-weights", cl::init(false), cl::Hidden,
+    cl::desc("If new branch weights must be found, work harder to keep them "
+             "uniform."));
+
 static cl::opt<bool>
 UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
                     cl::desc("Verify domtree after unrolling"),
@@ -493,9 +498,9 @@ static bool canHaveUnrollRemainder(const Loop *L) {
 //
 // There are often many sets of latch probabilities that can produce the
 // original total loop body frequency.  If there are many remaining conditional
-// latches, this function just quickly hacks a few of their probabilities to
-// restore the original total loop body frequency.  Otherwise, it determines
-// less arbitrary probabilities.
+// latches and !UnrollUniformWeights, this function just quickly hacks a few of
+// their probabilities to restore the original total loop body frequency.
+// Otherwise, it tries harder to determine less arbitrary probabilities.
 static void fixProbContradiction(UnrollLoopOptions ULO,
                                  BranchProbability OriginalLoopProb,
                                  bool CompletelyUnroll,
@@ -580,10 +585,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
       SetProb(I, Prob);
   };
 
-  // If n <= 2, we choose the simplest probability model we can think of: every
-  // remaining conditional branch instruction has the same probability, Prob,
-  // of continuing to the next iteration.  This model has several helpful
-  // properties:
+  // If UnrollUniformWeights or n <= 2, we choose the simplest probability 
model
+  // we can think of: every remaining conditional branch instruction has the
+  // same probability, Prob, of continuing to the next iteration.  This model
+  // has several helpful properties:
+  // - There is only one search parameter, Prob.
   // - We have no reason to think one latch branch's probability should be
   //   higher or lower than another, and so this model makes them all the same.
   //   In the worst cases, we thus avoid setting just some probabilities to 0 
or
@@ -596,14 +602,32 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   //
   //     FreqOne = Sum(i=0..n)(c_i * p^i)
   //
-  // - If the backedge has been eliminated, FreqOne is the total frequency of
-  //   the original loop body in the unrolled loop.
-  // - If the backedge remains, Sum(i=0..inf)(FreqOne * p^(n*i)) =
-  //   FreqOne / (1 - p^n) is the total frequency of the original loop body in
-  //   the unrolled loop, regardless of whether the backedge is conditional or
-  //   unconditional.
+  // - If the backedge has been eliminated:
+  //   - FreqOne is the total frequency of the original loop body in the
+  //     unrolled loop.
+  //   - If Prob == 1, the total frequency of the original loop body is exactly
+  //     the number of remaining loop iterations, as expected because every
+  //     remaining loop iteration always then executes.
+  // - If the backedge remains:
+  //   - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total
+  //     frequency of the original loop body in the unrolled loop, regardless 
of
+  //     whether the backedge is conditional or unconditional.
+  //   - As Prob approaches 1, the total frequency of the original loop body
+  //     approaches infinity, as expected because the loop approaches never
+  //     exiting.
   // - For n <= 2, we can use simple formulas to solve the above polynomial
   //   equations exactly for p without performing a search.
+  // - For n > 2, evaluating each point in the search space, using ComputeFreq
+  //   below, requires about as few instructions as we could hope for.  That 
is,
+  //   the probability is constant across the conditional branches, so the only
+  //   computation is across conditional branches and any backedge, as required
+  //   for any model for Prob.
+  // - Prob == 1 produces the maximum possible total frequency for the original
+  //   loop body, as described above.  Prob == 0 produces the minimum, 0.
+  //   Increasing or decreasing Prob monotonically increases or decreases the
+  //   frequency, respectively.  Thus, for every possible frequency, there
+  //   exists some Prob that can produce it, and we can easily use bisection to
+  //   search the problem space.
 
   // When iterating for a solution, we stop early if we find probabilities
   // that produce a Freq whose difference from FreqDesired is small
@@ -611,6 +635,21 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   // accurate (but surely far more accurate).
   const double FreqPrec = 1e-6;
 
+  // Compute the new frequency produced by using Prob throughout CondLatches.
+  auto ComputeFreq = [&](double Prob) {
+    double ProbReaching = 1;        // p^0
+    double FreqOne = IterCounts[0]; // c_0*p^0
+    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
+      ProbReaching *= Prob;                        // p^(I+1)
+      FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
+    }
+    double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
+    assert(FreqOne > 0 && "Expected at least one iteration before first 
latch");
+    if (ProbReachingBackedge == 1)
+      return std::numeric_limits<double>::infinity();
+    return FreqOne / (1 - ProbReachingBackedge);
+  };
+
   // Compute the probability that, used at CondLaches[0] where
   // CondLatches.size() == 1, gets as close as possible to FreqDesired.
   auto ComputeProbForLinear = [&]() {
@@ -619,6 +658,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
     double B = IterCounts[0] - FreqDesired;
     assert(A > 0 && "Expected iterations after last conditional latch");
     double Prob = -B / A;
+    // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+    // Otherwise, Prob should produce nearly FreqDesired.
+    assert((Prob < 0 || Prob > 1 ||
+            fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+           "Expected accurate frequency when linear case is possible");
     Prob = std::max(Prob, 0.);
     Prob = std::min(Prob, 1.);
     return Prob;
@@ -633,6 +677,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
     double C = IterCounts[0] - FreqDesired;
     assert(A > 0 && "Expected iterations after last conditional latch");
     double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A);
+    // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+    // Otherwise, Prob should produce nearly FreqDesired.
+    assert((Prob < 0 || Prob > 1 ||
+            fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+           "Expected accurate frequency when quadratic case is possible");
     Prob = std::max(Prob, 0.);
     Prob = std::min(Prob, 1.);
     return Prob;
@@ -733,12 +782,17 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   };
 
   // Determine and set branch weights.
+  //
+  // Prob < 0 and Prob > 1 cannot be represented as branch weights.  We might
+  // compute such a Prob if FreqDesired is impossible (e.g., due to inaccurate
+  // profile data) for the maximum trip count we have determined when 
completely
+  // unrolling.  In that case, so just go with whichever is closest.
   if (CondLatches.size() == 1) {
     SetAllProbs(ComputeProbForLinear());
   } else if (CondLatches.size() == 2) {
     SetAllProbs(ComputeProbForQuadratic());
-  } else {
-    // The polynomial is too complex for a simple formula, so the quick and
+  } else if (!UnrollUniformWeights) {
+    // The polynomial is too complex for a simple formula, and the quick and
     // dirty fix has been selected.  Adjust probabilities starting from the
     // first latch, which has the most influence on the total frequency, so
     // starting there should minimize the number of latches that have to be
@@ -752,6 +806,34 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
       if (fabs(Freq - FreqDesired) < FreqPrec)
         break;
     }
+  } else {
+    // The polynomial is too complex for a simple formula, and uniform branch
+    // weights have been selected, so bisect.
+    double ProbMin, ProbMax, ProbPrev;
+    auto TryProb = [&](double Prob) {
+      ProbPrev = Prob;
+      double FreqDelta = ComputeFreq(Prob) - FreqDesired;
+      if (fabs(FreqDelta) < FreqPrec)
+        return 0;
+      if (FreqDelta < 0) {
+        ProbMin = Prob;
+        return -1;
+      }
+      ProbMax = Prob;
+      return 1;
+    };
+    // If Prob == 0 is too small and Prob == 1 is too large, bisect between
+    // them.  To place a hard upper limit on the search time, stop bisecting
+    // when Prob stops changing (ProbDelta) by much (ProbPrec).
+    if (TryProb(0.) < 0 && TryProb(1.) > 0) {
+      const double ProbPrec = 1e-12;
+      double Prob, ProbDelta;
+      do {
+        Prob = (ProbMin + ProbMax) / 2;
+        ProbDelta = Prob - ProbPrev;
+      } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec);
+    }
+    SetAllProbs(ProbPrev);
   }
 
   // FIXME: We have not considered non-latch loop exits:
diff --git 
a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll 
b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
index 353e74be9fbd1..69da20802a0ae 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
@@ -58,12 +58,14 @@
 ; Check 1 max iteration:
 ; - Unroll count of >=1 should always produce complete unrolling.
 ; - That produces 0 unrolled iteration latches, so there are no branch weights
-;   to compute.
+;   to compute.  Thus, -unroll-uniform-weights has no effect.
 ;
 ; Original loop body frequency is 2 (loop weight 1), which is impossibly high.
 ;
 ;   RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG1210
+;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210
 ;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210
 ;
@@ -77,6 +79,8 @@
 ;
 ;   RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG1110
+;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110
 ;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110
 ;
@@ -90,7 +94,8 @@
 ; Check 2 max iterations:
 ; - Unroll count of >=2 should always produce complete unrolling.
 ; - That produces <=1 unrolled iteration latch, so the implementation can
-;   compute uniform weights by solving, at worst, a linear equation.
+;   compute uniform weights by solving, at worst, a linear equation.  Thus,
+;   -unroll-uniform-weights has no effect.
 ;
 ; Original loop body frequency is 3 (loop weight 2), which is impossibly high.
 ;
@@ -99,6 +104,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2310
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310
 ;
@@ -120,6 +127,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2320
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320
 ;
@@ -140,6 +149,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2210
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210
 ;
@@ -159,6 +170,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2220
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220
 ;
@@ -179,6 +192,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2110
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110
 ;
@@ -198,6 +213,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2120
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120
 ;
@@ -215,7 +232,8 @@
 ; Check 3 max iterations:
 ; - Unroll count of >=3 should always produce complete unrolling.
 ; - That produces <=2 unrolled iteration latches, so the implementation can
-;   compute uniform weights solving, at worst, a quadratic equation.
+;   compute uniform weights solving, at worst, a quadratic equation.  Thus,
+;   -unroll-uniform-weights has no effect.
 ;
 ; Original loop body frequency is 4 (loop weight 3), which is impossibly high.
 ;
@@ -224,6 +242,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3410
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410
 ;
@@ -248,6 +268,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3430
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430
 ;
@@ -269,6 +291,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG343x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x
 ;
@@ -295,6 +319,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3310
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310
 ;
@@ -317,6 +343,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3330
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330
 ;
@@ -338,6 +366,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG333x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x
 ;
@@ -363,6 +393,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3210
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210
 ;
@@ -385,6 +417,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3230
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230
 ;
@@ -406,6 +440,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG323x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x
 ;
@@ -430,6 +466,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3110
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110
 ;
@@ -452,6 +490,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3130
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130
 ;
@@ -473,6 +513,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG313x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x
 ;
@@ -496,6 +538,7 @@
 ; - Unroll count of >=4 should always produce complete unrolling.
 ; - That produces <=3 unrolled iteration latches.  3 is the lowest number where
 ;   the implementation cannot compute uniform weights using a simple formula.
+;   Thus, this is our first case where -unroll-uniform-weights matters.
 ;
 ; Original loop body frequency is 5 (loop weight 4), which is impossibly high.
 ;
@@ -504,6 +547,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4510
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510
 ;
@@ -531,6 +576,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4540
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540
 ;
@@ -554,6 +601,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG454x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR454x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR454x
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR454x
 ;
@@ -582,6 +631,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4410
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4410
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4410
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4410
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4410
 ;
@@ -607,6 +658,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4440
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4440
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4440
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4440
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4440
 ;
@@ -630,6 +683,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG444x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR444x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR444x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR444x
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR444x
 ;
@@ -650,39 +705,52 @@
 ;     UR444x:     !0 = !{!"branch_weights", i32 0, i32 -2147483648}
 ;
 ; Original loop body frequency is 3 (loop weight 2).  This is our first case
-; where the new probabilities vary.
+; where the new probabilities vary (unless -unroll-uniform-weights).
 ;
 ;   First use a variable iteration count so that all non-final unrolled
 ;   iterations' latches remain conditional.
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4310
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} 
UR4310,UNIF4310
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} 
UR4310,UNIF4310
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310,FAST4310
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310,FAST4310
 ;
 ;     The sum of the new do.body* is always approximately the old do.body.
 ;     ORIG4310: - do.body: float = 3.0,
-;     UR4310: - do.body: float = 1.0,
-;     UR4310: - do.body.1: float = 0.94737,
-;     UR4310: - do.body.2: float = 0.63158,
-;     UR4310: - do.body.3: float = 0.42105,
-;
-;     UR4310:  call void @f
-;     UR4310:  br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4310:  call void @f
-;     UR4310:  br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1
-;     UR4310:  call void @f
-;     UR4310:  br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !1
-;     UR4310:  call void @f
-;     UR4310:  br label %do.end
-;     UR4310:  !0 = !{!"branch_weights", i32 113025456, i32 2034458192}
-;     UR4310:  !1 = !{!"branch_weights", i32 1, i32 2}
+;     UNIF4310: - do.body: float = 1.0,
+;     UNIF4310: - do.body.1: float = 0.81054,
+;     UNIF4310: - do.body.2: float = 0.65697,
+;     UNIF4310: - do.body.3: float = 0.5325,
+;     FAST4310: - do.body: float = 1.0,
+;     FAST4310: - do.body.1: float = 0.94737,
+;     FAST4310: - do.body.2: float = 0.63158,
+;     FAST4310: - do.body.3: float = 0.42105,
+;
+;     UR4310:        call void @f
+;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4310:        call void @f
+;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.2,
+;     UNIF4310-SAME:   !prof !0
+;     FAST4310-SAME:   !prof !1
+;     UR4310:        call void @f
+;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.3,
+;     UNIF4310-SAME:   !prof !0
+;     FAST4310-SAME:   !prof !1
+;     UR4310:        call void @f
+;     UR4310:        br label %do.end
+;     UNIF4310:      !0 = !{!"branch_weights", i32 406871040, i32 1740612608}
+;     FAST4310:      !0 = !{!"branch_weights", i32 113025456, i32 2034458192}
+;     FAST4310:      !1 = !{!"branch_weights", i32 1, i32 2}
 ;
 ;   Now use a constant iteration count so that all non-final unrolled
 ;   iterations' latches unconditionally continue.
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4340
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4340
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4340
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4340
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4340
 ;
@@ -706,6 +774,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG434x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR434x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR434x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR434x
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR434x
 ;
@@ -732,32 +802,45 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4210
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} 
UR4210,UNIF4210
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} 
UR4210,UNIF4210
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210,FAST4210
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210,FAST4210
 ;
 ;     The sum of the new do.body* is always the old do.body.
 ;     ORIG4210: - do.body: float = 2.0,
-;     UR4210: - do.body: float = 1.0,
-;     UR4210: - do.body.1: float = 0.57143,
-;     UR4210: - do.body.2: float = 0.28571,
-;     UR4210: - do.body.3: float = 0.14286,
-;
-;     UR4210: call void @f
-;     UR4210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4210: call void @f
-;     UR4210: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1
-;     UR4210: call void @f
-;     UR4210: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !1
-;     UR4210: call void @f
-;     UR4210: br label %do.end
-;     UR4210: !0 = !{!"branch_weights", i32 920350135, i32 1227133513}
-;     UR4210: !1 = !{!"branch_weights", i32 1, i32 1}
+;     UNIF4210: - do.body: float = 1.0,
+;     UNIF4210: - do.body.1: float = 0.54369,
+;     UNIF4210: - do.body.2: float = 0.2956,
+;     UNIF4210: - do.body.3: float = 0.16071,
+;     FAST4210: - do.body: float = 1.0,
+;     FAST4210: - do.body.1: float = 0.57143,
+;     FAST4210: - do.body.2: float = 0.28571,
+;     FAST4210: - do.body.3: float = 0.14286,
+;
+;     UR4210:        call void @f
+;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4210:        call void @f
+;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.2,
+;     UNIF4210-SAME:   !prof !0
+;     FAST4210-SAME:   !prof !1
+;     UR4210:        call void @f
+;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.3,
+;     UNIF4210-SAME:   !prof !0
+;     FAST4210-SAME:   !prof !1
+;     UR4210:        call void @f
+;     UR4210:        br label %do.end
+;     UNIF4210:      !0 = !{!"branch_weights", i32 979920896, i32 1167562752}
+;     FAST4210:      !0 = !{!"branch_weights", i32 920350135, i32 1227133513}
+;     FAST4210:      !1 = !{!"branch_weights", i32 1, i32 1}
 ;
 ;   Now use a constant iteration count so that all non-final unrolled
 ;   iterations' latches unconditionally continue.
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4240
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4240
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4240
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4240
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4240
 ;
@@ -781,6 +864,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG424x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR424x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR424x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR424x
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR424x
 ;
@@ -807,6 +892,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4110
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4110
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4110
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4110
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4110
 ;
@@ -832,6 +919,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG4140
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4140
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4140
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4140
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4140
 ;
@@ -855,6 +944,8 @@
 ;
 ;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > 
%t.ll
 ;     RUN: %{bf-fc} ORIG414x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR414x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR414x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR414x
 ;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR414x
 ;
@@ -880,14 +971,16 @@
 ; - Unroll count of >=5 should always produce complete unrolling.
 ; - That produces <=4 unrolled iteration latches.  When at least 3 remain
 ;   conditional, the implementation cannot compute uniform weights using a
-;   simple formula.
+;   simple formula, so -unroll-uniform-weights matters.
 ;
 ; Original loop body frequency is 5 (loop weight 4).
 ;
 ;   RUN: sed -e s/@MAX@/5/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG5510
-;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510
-;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510
+;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} 
UR5510,UNIF5510
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} 
UR5510,UNIF5510
+;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510,FAST5510
+;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510,FAST5510
 ;
 ;   The sum of the new do.body* is the old do.body.
 ;   ORIG5510: - do.body: float = 5.0,
@@ -899,55 +992,75 @@
 ;
 ;   All continue probabilities are approximately 1, but somehow there is less
 ;   precision in the calculation of the last case.
-;   UR5510: call void @f
-;   UR5510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;   UR5510: call void @f
-;   UR5510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-;   UR5510: call void @f
-;   UR5510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-;   UR5510: call void @f
-;   UR5510: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !1
-;   UR5510: call void @f
-;   UR5510: br label %do.end
-;   UR5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;   UR5510: !1 = !{!"branch_weights", i32 10, i32 2147483638}
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.4,
+;   UNIF5510-SAME:   !prof !0
+;   FAST5510-SAME:   !prof !1
+;   UR5510:        call void @f
+;   UR5510:        br label %do.end
+;   UNIF5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   FAST5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   FAST5510: !1 = !{!"branch_weights", i32 10, i32 2147483638}
 ;
 ; Original loop body frequency is 4 (loop weight 3).
 ;
 ;   RUN: sed -e s/@MAX@/5/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG5410
-;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410
-;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410
+;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} 
UR5410,UNIF5410
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} 
UR5410,UNIF5410
+;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410,FAST5410
+;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410,FAST5410
 ;
 ;   The sum of the new do.body* is always the old do.body.
 ;   ORIG5410: - do.body: float = 4.0,
-;   UR5410: - do.body: float = 1.0,
-;   UR5410: - do.body.1: float = 1.0,
-;   UR5410: - do.body.2: float = 0.86486,
-;   UR5410: - do.body.3: float = 0.64865,
-;   UR5410: - do.body.4: float = 0.48649,
-;
-;   This is our first case where the implementation must adjust multiple
-;   probabilities to something other than the original latch probability but
-;   does not just set all probabilities to the limit of 1 or 0.
-;   UR5410: call void @f
-;   UR5410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;   UR5410: call void @f
-;   UR5410: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1
-;   UR5410: call void @f
-;   UR5410: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !2
-;   UR5410: call void @f
-;   UR5410: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !2
-;   UR5410: call void @f
-;   UR5410: br label %do.end
-;   UR5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;   UR5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155}
-;   UR5410: !2 = !{!"branch_weights", i32 1, i32 3}
+;   UNIF5410: - do.body: float = 1.0,
+;   UNIF5410: - do.body.1: float = 0.88818,
+;   UNIF5410: - do.body.2: float = 0.78886,
+;   UNIF5410: - do.body.3: float = 0.70065,
+;   UNIF5410: - do.body.4: float = 0.62231,
+;   FAST5410: - do.body: float = 1.0,
+;   FAST5410: - do.body.1: float = 1.0,
+;   FAST5410: - do.body.2: float = 0.86486,
+;   FAST5410: - do.body.3: float = 0.64865,
+;   FAST5410: - do.body.4: float = 0.48649,
+;
+;   This is our first case where, when not using -unroll-uniform-weights, the
+;   implementation must adjust multiple probabilities to something other than
+;   the original latch probability but does not just set all probabilities to
+;   the limit of 1 or 0.
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.2,
+;   UNIF5410-SAME:   !prof !0
+;   FAST5410-SAME:   !prof !1
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.3,
+;   UNIF5410-SAME:   !prof !0
+;   FAST5410-SAME:   !prof !2
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.4,
+;   UNIF5410-SAME:   !prof !0
+;   FAST5410-SAME:   !prof !2
+;   UR5410:        call void @f
+;   UR5410:        br label %do.end
+;   UNIF5410: !0 = !{!"branch_weights", i32 240132096, i32 1907351552}
+;   FAST5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   FAST5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155}
+;   FAST5410: !2 = !{!"branch_weights", i32 1, i32 3}
 ;
 ; Original loop body frequency is 1 (loop weight 0).
 ;
 ;   RUN: sed -e s/@MAX@/5/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG5110
+;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5110
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5110
 ;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5110
 ;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5110
 ;
diff --git 
a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
 
b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
index a87c9390ee780..84f92e519a1c3 100644
--- 
a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
+++ 
b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
@@ -67,6 +67,7 @@
 ; their original probabilities are not contradicted.  That is, the original 
loop
 ; latch's branch weights remain on all unrolled iterations' latches.
 ;
+;   RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} MULT3
 ;   RUN: %{ur-bf} -unroll-count=3 | %{fc} MULT3
 ;
 ;   Sums to approximately the original loop body frequency, 10.
@@ -91,7 +92,9 @@
 ;
 ;   -unroll-count=2, so there is 1 remaining conditional latch, so the
 ;   implementation can compute uniform weights by solving a linear equation.
+;   Thus, -unroll-uniform-weights has no effect.
 ;
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} MULT2
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} MULT2
 ;
 ;     Multiply by 2 to get the original loop body frequency, 10.
@@ -111,7 +114,9 @@
 ;
 ;   -unroll-count=4, so there are 2 remaining conditional latches, so the
 ;   implementation can compute uniform weights using the quadratic formula.
+;   Thus, -unroll-uniform-weights has no effect.
 ;
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} MULT4
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} MULT4
 ;
 ;     Multiply by 2 and sum to get the original loop body frequency, 10.
@@ -136,14 +141,20 @@
 ;
 ;   -unroll-count=6, so there are 3 remaining conditional latches, the lowest
 ;   number where the implementation cannot compute uniform weights using a
-;   simple formula.
+;   simple formula.  Thus, this is our first case where -unroll-uniform-weights
+;   matters.
 ;
-;     RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6
+;     RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} 
MULT6,MUNIF6
+;     RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6,MFAST6
 ;
-;     Multiply by 2 and sum to get the original loop body frequency, 10.
-;     MULT6: - do.body: float = 2.1956,
-;     MULT6: - do.body.2: float = 1.476,
-;     MULT6: - do.body.4: float = 1.3284,
+;     For either MUNIF or MFAST, multiply by 2 and sum to get the original loop
+;     body frequency, 10.
+;     MUNIF6: - do.body: float = 2.0492,
+;     MUNIF6: - do.body.2: float = 1.6393,
+;     MUNIF6: - do.body.4: float = 1.3115,
+;     MFAST6: - do.body: float = 2.1956,
+;     MFAST6: - do.body.2: float = 1.476,
+;     MFAST6: - do.body.4: float = 1.3284,
 ;
 ;     MULT6:       call void @f
 ;     MULT6-NOT:   br
@@ -152,19 +163,31 @@
 ;     MULT6:       call void @f
 ;     MULT6-NOT:   br
 ;     MULT6:       call void @f
-;     MULT6:       br i1 %{{.*}}, label %do.body.4, label %do.end, !prof !1
+;     MULT6:       br i1 %{{.*}}, label %do.body.4, label %do.end,
+;     MUNIF6-SAME:   !prof !0
+;     MFAST6-SAME:   !prof !1
 ;     MULT6:       call void @f
 ;     MULT6-NOT:   br
 ;     MULT6:       call void @f
-;     MULT6:       br i1 %{{.*}}, label %do.body, label %do.end, !prof !1, 
!llvm.loop !2
-;
-;     There are 3 conditional latches remaining, so it adjusts the first and
+;     MULT6:       br i1 %{{.*}}, label %do.body, label %do.end,
+;     MUNIF6-SAME:   !prof !0, !llvm.loop !1
+;     MFAST6-SAME:   !prof !1, !llvm.loop !2
+;
+;     MUNIF6 is like applying -unroll-count=3 to MULT2 without converting any
+;     additional conditional latches to unconditional, so (approximately)
+;     MULT2's branch weights make sense.
+;     MUNIF6: !0 = !{!"branch_weights", i32 1717986944, i32 429496704}
+;     MUNIF6: !1 = distinct !{!1, !2, !3}
+;     MUNIF6: !2 = !{!"llvm.loop.estimated_trip_count", i32 2}
+;     MUNIF6: !3 = !{!"llvm.loop.unroll.disable"}
+;
+;     There are 3 conditional latches remaining, so MFAST6 adjusts the first 
and
 ;     leaves the second two with the original loop's branch weights.
-;     MULT6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162}
-;     MULT6: !1 = !{!"branch_weights", i32 9, i32 1}
-;     MULT6: !2 = distinct !{!2, !3, !4}
-;     MULT6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2}
-;     MULT6: !4 = !{!"llvm.loop.unroll.disable"}
+;     MFAST6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162}
+;     MFAST6: !1 = !{!"branch_weights", i32 9, i32 1}
+;     MFAST6: !2 = distinct !{!2, !3, !4}
+;     MFAST6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2}
+;     MFAST6: !4 = !{!"llvm.loop.unroll.disable"}
 
 ; 
------------------------------------------------------------------------------
 ; Check case when the original loop's number of iterations is a run-time
@@ -185,6 +208,7 @@
 ; implementation tries to compute uniform weights by solving a linear equation
 ; but ultimately sets the latch's probability to zero.
 ;
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} LOW2
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} LOW2
 ;
 ;   Multiply by 2, but the result is greater than the original loop body
@@ -205,6 +229,7 @@
 ; implementation tries to compute uniform weights using the quadratic formula
 ; but ultimately sets both latches' probabilities to zero.
 ;
+;   RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} LOW4
 ;   RUN: %{ur-bf} -unroll-count=4 | %{fc} LOW4
 ;
 ;   Multiply by 2 and sum, but the result is greater than the original loop 
body
@@ -228,12 +253,13 @@
 ;
 ; -unroll-count=6, so there are 3 remaining conditional latches.  The
 ; implementation cannot compute uniform weights using a simple formula, and
-; ultimately it must set all those latches' probabilities to zero.  The
-; implementation will face a new stumbling block starting at the second latch:
-; reaching the remaining iterations already has a zero probability due to the
-; zero probability set at the first latch, so the required probability could
-; accidentally be computed as negative infinity.
+; ultimately it must set all those latches' probabilities to zero.  If not
+; -unroll-uniform-weights, then the implementation will face a new stumbling
+; block starting at the second latch: reaching the remaining iterations already
+; has a zero probability due to the zero probability set at the first latch, so
+; the required probability could accidentally be computed as negative infinity.
 ;
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} LOW6
 ;   RUN: %{ur-bf} -unroll-count=6 | %{fc} LOW6
 ;
 ;   Multiply by 2 and sum, but the result is greater than the original loop 
body
@@ -269,7 +295,7 @@
 ; Because we test only partial unrolling, there is always exactly one unrolled
 ; iteration that can possibly exit, so only its latch can remain conditional.
 ; Because there is only one, its branch weights can be computed with a simple
-; formula.
+; formula, and -unroll-uniform-weights does not matter.
 ;
 ; Check the original loop body frequency.
 ;
@@ -278,6 +304,7 @@
 ;
 ; Check when the unrolled loop's backedge remains conditional.
 ;
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} CONST2
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} CONST2
 ;
 ;   Multiply by 2 to get the original loop body frequency, 10.
@@ -296,6 +323,7 @@
 ;
 ; Check when the unrolled loop's backedge unconditionally continues.
 ;
+;   RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} CONST4
 ;   RUN: %{ur-bf} -unroll-count=4 | %{fc} CONST4
 ;
 ;   Multiply by 2 and sum to get the original loop body frequency, 10.

_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopUnroll] Fix freqs for unconditional latches: N>2, uniform (PR #182405)

Reply via email to