https://github.com/jdenny-ornl created https://github.com/llvm/llvm-project/pull/182405
This patch introduces the command-line option `-unroll-uniform-weights`. When computing probabilities for the remaining N conditional latches in the unrolled loop after converting some iterations' latches to unconditional, LoopUnroll now supports the following three strategies: - A. If N <= 2, use a simple formula to compute a single uniform probability across those latches. - B. Otherwise, if `-unroll-uniform-weights` is not specified, apply the original loop's probability to all N latches and then, as needed, adjust as few of them as possible. - C. Otherwise, bisect the range [0,1] to find a single uniform probability across all N latches. This patch implements this strategy. An issue with C is that it could impact compiler performance, so this patch makes it opt-in. Its appeal over B is that it treats all latches the same given that we have no evidence showing that any latch should have a higher or lower probability than any other. A has neither problem, but I do not know how to apply it for N > 2. More experience or feedback from others might determine that some strategies are not worthwhile to maintain. >From d3054a4e08388bda0bb58a0557b871dd6c6cb9df Mon Sep 17 00:00:00 2001 From: "Joel E. Denny" <[email protected]> Date: Thu, 19 Feb 2026 17:53:59 -0500 Subject: [PATCH] [LoopUnroll] Fix freqs for unconditional latches: N>2, uniform This patch introduces the command-line option `-unroll-uniform-weights`. When computing probabilities for the remaining N conditional latches in the unrolled loop after converting some iterations' latches to unconditional, LoopUnroll now supports the following three strategies: - A. If N <= 2, use a simple formula to compute a single uniform probability across those latches. - B. Otherwise, if `-unroll-uniform-weights` is not specified, apply the original loop's probability to all N latches and then, as needed, adjust as few of them as possible. - C. Otherwise, bisect the range [0,1] to find a single uniform probability across all N latches. This patch implements this strategy. An issue with C is that it could impact compiler performance, so this patch makes it opt-in. Its appeal over B is that it treats all latches the same given that we have no evidence showing that any latch should have a higher or lower probability than any other. A has neither problem, but I do not know how to apply it for N > 2. More experience or feedback from others might determine that some strategies are not worthwhile to maintain. --- llvm/lib/Transforms/Utils/LoopUnroll.cpp | 112 +++++++- .../branch-weights-freq/unroll-complete.ll | 267 +++++++++++++----- .../unroll-partial-unconditional-latch.ll | 70 +++-- 3 files changed, 336 insertions(+), 113 deletions(-) diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp index 404e254c8a66f..1028b0f026ea3 100644 --- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp @@ -89,6 +89,11 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden, cl::desc("Allow runtime unrolled loops to be unrolled " "with epilog instead of prolog.")); +static cl::opt<bool> UnrollUniformWeights( + "unroll-uniform-weights", cl::init(false), cl::Hidden, + cl::desc("If new branch weights must be found, work harder to keep them " + "uniform.")); + static cl::opt<bool> UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden, cl::desc("Verify domtree after unrolling"), @@ -493,9 +498,9 @@ static bool canHaveUnrollRemainder(const Loop *L) { // // There are often many sets of latch probabilities that can produce the // original total loop body frequency. If there are many remaining conditional -// latches, this function just quickly hacks a few of their probabilities to -// restore the original total loop body frequency. Otherwise, it determines -// less arbitrary probabilities. +// latches and !UnrollUniformWeights, this function just quickly hacks a few of +// their probabilities to restore the original total loop body frequency. +// Otherwise, it tries harder to determine less arbitrary probabilities. static void fixProbContradiction(UnrollLoopOptions ULO, BranchProbability OriginalLoopProb, bool CompletelyUnroll, @@ -580,10 +585,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO, SetProb(I, Prob); }; - // If n <= 2, we choose the simplest probability model we can think of: every - // remaining conditional branch instruction has the same probability, Prob, - // of continuing to the next iteration. This model has several helpful - // properties: + // If UnrollUniformWeights or n <= 2, we choose the simplest probability model + // we can think of: every remaining conditional branch instruction has the + // same probability, Prob, of continuing to the next iteration. This model + // has several helpful properties: + // - There is only one search parameter, Prob. // - We have no reason to think one latch branch's probability should be // higher or lower than another, and so this model makes them all the same. // In the worst cases, we thus avoid setting just some probabilities to 0 or @@ -596,14 +602,32 @@ static void fixProbContradiction(UnrollLoopOptions ULO, // // FreqOne = Sum(i=0..n)(c_i * p^i) // - // - If the backedge has been eliminated, FreqOne is the total frequency of - // the original loop body in the unrolled loop. - // - If the backedge remains, Sum(i=0..inf)(FreqOne * p^(n*i)) = - // FreqOne / (1 - p^n) is the total frequency of the original loop body in - // the unrolled loop, regardless of whether the backedge is conditional or - // unconditional. + // - If the backedge has been eliminated: + // - FreqOne is the total frequency of the original loop body in the + // unrolled loop. + // - If Prob == 1, the total frequency of the original loop body is exactly + // the number of remaining loop iterations, as expected because every + // remaining loop iteration always then executes. + // - If the backedge remains: + // - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total + // frequency of the original loop body in the unrolled loop, regardless of + // whether the backedge is conditional or unconditional. + // - As Prob approaches 1, the total frequency of the original loop body + // approaches infinity, as expected because the loop approaches never + // exiting. // - For n <= 2, we can use simple formulas to solve the above polynomial // equations exactly for p without performing a search. + // - For n > 2, evaluating each point in the search space, using ComputeFreq + // below, requires about as few instructions as we could hope for. That is, + // the probability is constant across the conditional branches, so the only + // computation is across conditional branches and any backedge, as required + // for any model for Prob. + // - Prob == 1 produces the maximum possible total frequency for the original + // loop body, as described above. Prob == 0 produces the minimum, 0. + // Increasing or decreasing Prob monotonically increases or decreases the + // frequency, respectively. Thus, for every possible frequency, there + // exists some Prob that can produce it, and we can easily use bisection to + // search the problem space. // When iterating for a solution, we stop early if we find probabilities // that produce a Freq whose difference from FreqDesired is small @@ -611,6 +635,21 @@ static void fixProbContradiction(UnrollLoopOptions ULO, // accurate (but surely far more accurate). const double FreqPrec = 1e-6; + // Compute the new frequency produced by using Prob throughout CondLatches. + auto ComputeFreq = [&](double Prob) { + double ProbReaching = 1; // p^0 + double FreqOne = IterCounts[0]; // c_0*p^0 + for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) { + ProbReaching *= Prob; // p^(I+1) + FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1) + } + double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching; + assert(FreqOne > 0 && "Expected at least one iteration before first latch"); + if (ProbReachingBackedge == 1) + return std::numeric_limits<double>::infinity(); + return FreqOne / (1 - ProbReachingBackedge); + }; + // Compute the probability that, used at CondLaches[0] where // CondLatches.size() == 1, gets as close as possible to FreqDesired. auto ComputeProbForLinear = [&]() { @@ -619,6 +658,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO, double B = IterCounts[0] - FreqDesired; assert(A > 0 && "Expected iterations after last conditional latch"); double Prob = -B / A; + // If it computes an invalid Prob, FreqDesired is impossibly low or high. + // Otherwise, Prob should produce nearly FreqDesired. + assert((Prob < 0 || Prob > 1 || + fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) && + "Expected accurate frequency when linear case is possible"); Prob = std::max(Prob, 0.); Prob = std::min(Prob, 1.); return Prob; @@ -633,6 +677,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO, double C = IterCounts[0] - FreqDesired; assert(A > 0 && "Expected iterations after last conditional latch"); double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A); + // If it computes an invalid Prob, FreqDesired is impossibly low or high. + // Otherwise, Prob should produce nearly FreqDesired. + assert((Prob < 0 || Prob > 1 || + fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) && + "Expected accurate frequency when quadratic case is possible"); Prob = std::max(Prob, 0.); Prob = std::min(Prob, 1.); return Prob; @@ -733,12 +782,17 @@ static void fixProbContradiction(UnrollLoopOptions ULO, }; // Determine and set branch weights. + // + // Prob < 0 and Prob > 1 cannot be represented as branch weights. We might + // compute such a Prob if FreqDesired is impossible (e.g., due to inaccurate + // profile data) for the maximum trip count we have determined when completely + // unrolling. In that case, so just go with whichever is closest. if (CondLatches.size() == 1) { SetAllProbs(ComputeProbForLinear()); } else if (CondLatches.size() == 2) { SetAllProbs(ComputeProbForQuadratic()); - } else { - // The polynomial is too complex for a simple formula, so the quick and + } else if (!UnrollUniformWeights) { + // The polynomial is too complex for a simple formula, and the quick and // dirty fix has been selected. Adjust probabilities starting from the // first latch, which has the most influence on the total frequency, so // starting there should minimize the number of latches that have to be @@ -752,6 +806,34 @@ static void fixProbContradiction(UnrollLoopOptions ULO, if (fabs(Freq - FreqDesired) < FreqPrec) break; } + } else { + // The polynomial is too complex for a simple formula, and uniform branch + // weights have been selected, so bisect. + double ProbMin, ProbMax, ProbPrev; + auto TryProb = [&](double Prob) { + ProbPrev = Prob; + double FreqDelta = ComputeFreq(Prob) - FreqDesired; + if (fabs(FreqDelta) < FreqPrec) + return 0; + if (FreqDelta < 0) { + ProbMin = Prob; + return -1; + } + ProbMax = Prob; + return 1; + }; + // If Prob == 0 is too small and Prob == 1 is too large, bisect between + // them. To place a hard upper limit on the search time, stop bisecting + // when Prob stops changing (ProbDelta) by much (ProbPrec). + if (TryProb(0.) < 0 && TryProb(1.) > 0) { + const double ProbPrec = 1e-12; + double Prob, ProbDelta; + do { + Prob = (ProbMin + ProbMax) / 2; + ProbDelta = Prob - ProbPrev; + } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec); + } + SetAllProbs(ProbPrev); } // FIXME: We have not considered non-latch loop exits: diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll index 353e74be9fbd1..69da20802a0ae 100644 --- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll +++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll @@ -58,12 +58,14 @@ ; Check 1 max iteration: ; - Unroll count of >=1 should always produce complete unrolling. ; - That produces 0 unrolled iteration latches, so there are no branch weights -; to compute. +; to compute. Thus, -unroll-uniform-weights has no effect. ; ; Original loop body frequency is 2 (loop weight 1), which is impossibly high. ; ; RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG1210 +; RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210 ; RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210 ; @@ -77,6 +79,8 @@ ; ; RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG1110 +; RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110 ; RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110 ; @@ -90,7 +94,8 @@ ; Check 2 max iterations: ; - Unroll count of >=2 should always produce complete unrolling. ; - That produces <=1 unrolled iteration latch, so the implementation can -; compute uniform weights by solving, at worst, a linear equation. +; compute uniform weights by solving, at worst, a linear equation. Thus, +; -unroll-uniform-weights has no effect. ; ; Original loop body frequency is 3 (loop weight 2), which is impossibly high. ; @@ -99,6 +104,8 @@ ; ; RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG2310 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310 ; @@ -120,6 +127,8 @@ ; ; RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG2320 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320 ; @@ -140,6 +149,8 @@ ; ; RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG2210 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210 ; @@ -159,6 +170,8 @@ ; ; RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG2220 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220 ; @@ -179,6 +192,8 @@ ; ; RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG2110 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110 ; @@ -198,6 +213,8 @@ ; ; RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG2120 +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120 ; @@ -215,7 +232,8 @@ ; Check 3 max iterations: ; - Unroll count of >=3 should always produce complete unrolling. ; - That produces <=2 unrolled iteration latches, so the implementation can -; compute uniform weights solving, at worst, a quadratic equation. +; compute uniform weights solving, at worst, a quadratic equation. Thus, +; -unroll-uniform-weights has no effect. ; ; Original loop body frequency is 4 (loop weight 3), which is impossibly high. ; @@ -224,6 +242,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3410 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410 ; @@ -248,6 +268,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3430 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430 ; @@ -269,6 +291,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG343x +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x ; @@ -295,6 +319,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3310 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310 ; @@ -317,6 +343,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3330 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330 ; @@ -338,6 +366,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG333x +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x ; @@ -363,6 +393,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3210 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210 ; @@ -385,6 +417,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3230 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230 ; @@ -406,6 +440,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG323x +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x ; @@ -430,6 +466,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3110 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110 ; @@ -452,6 +490,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG3130 +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130 ; @@ -473,6 +513,8 @@ ; ; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG313x +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x ; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x ; @@ -496,6 +538,7 @@ ; - Unroll count of >=4 should always produce complete unrolling. ; - That produces <=3 unrolled iteration latches. 3 is the lowest number where ; the implementation cannot compute uniform weights using a simple formula. +; Thus, this is our first case where -unroll-uniform-weights matters. ; ; Original loop body frequency is 5 (loop weight 4), which is impossibly high. ; @@ -504,6 +547,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4510 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510 ; @@ -531,6 +576,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4540 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540 ; @@ -554,6 +601,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG454x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR454x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR454x ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR454x ; @@ -582,6 +631,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4410 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4410 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4410 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4410 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4410 ; @@ -607,6 +658,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4440 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4440 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4440 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4440 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4440 ; @@ -630,6 +683,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG444x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR444x +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR444x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR444x ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR444x ; @@ -650,39 +705,52 @@ ; UR444x: !0 = !{!"branch_weights", i32 0, i32 -2147483648} ; ; Original loop body frequency is 3 (loop weight 2). This is our first case -; where the new probabilities vary. +; where the new probabilities vary (unless -unroll-uniform-weights). ; ; First use a variable iteration count so that all non-final unrolled ; iterations' latches remain conditional. ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4310 -; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310 -; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4310,UNIF4310 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4310,UNIF4310 +; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310,FAST4310 +; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310,FAST4310 ; ; The sum of the new do.body* is always approximately the old do.body. ; ORIG4310: - do.body: float = 3.0, -; UR4310: - do.body: float = 1.0, -; UR4310: - do.body.1: float = 0.94737, -; UR4310: - do.body.2: float = 0.63158, -; UR4310: - do.body.3: float = 0.42105, -; -; UR4310: call void @f -; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 -; UR4310: call void @f -; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1 -; UR4310: call void @f -; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !1 -; UR4310: call void @f -; UR4310: br label %do.end -; UR4310: !0 = !{!"branch_weights", i32 113025456, i32 2034458192} -; UR4310: !1 = !{!"branch_weights", i32 1, i32 2} +; UNIF4310: - do.body: float = 1.0, +; UNIF4310: - do.body.1: float = 0.81054, +; UNIF4310: - do.body.2: float = 0.65697, +; UNIF4310: - do.body.3: float = 0.5325, +; FAST4310: - do.body: float = 1.0, +; FAST4310: - do.body.1: float = 0.94737, +; FAST4310: - do.body.2: float = 0.63158, +; FAST4310: - do.body.3: float = 0.42105, +; +; UR4310: call void @f +; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 +; UR4310: call void @f +; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.2, +; UNIF4310-SAME: !prof !0 +; FAST4310-SAME: !prof !1 +; UR4310: call void @f +; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.3, +; UNIF4310-SAME: !prof !0 +; FAST4310-SAME: !prof !1 +; UR4310: call void @f +; UR4310: br label %do.end +; UNIF4310: !0 = !{!"branch_weights", i32 406871040, i32 1740612608} +; FAST4310: !0 = !{!"branch_weights", i32 113025456, i32 2034458192} +; FAST4310: !1 = !{!"branch_weights", i32 1, i32 2} ; ; Now use a constant iteration count so that all non-final unrolled ; iterations' latches unconditionally continue. ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4340 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4340 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4340 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4340 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4340 ; @@ -706,6 +774,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG434x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR434x +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR434x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR434x ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR434x ; @@ -732,32 +802,45 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4210 -; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210 -; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4210,UNIF4210 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4210,UNIF4210 +; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210,FAST4210 +; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210,FAST4210 ; ; The sum of the new do.body* is always the old do.body. ; ORIG4210: - do.body: float = 2.0, -; UR4210: - do.body: float = 1.0, -; UR4210: - do.body.1: float = 0.57143, -; UR4210: - do.body.2: float = 0.28571, -; UR4210: - do.body.3: float = 0.14286, -; -; UR4210: call void @f -; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 -; UR4210: call void @f -; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1 -; UR4210: call void @f -; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !1 -; UR4210: call void @f -; UR4210: br label %do.end -; UR4210: !0 = !{!"branch_weights", i32 920350135, i32 1227133513} -; UR4210: !1 = !{!"branch_weights", i32 1, i32 1} +; UNIF4210: - do.body: float = 1.0, +; UNIF4210: - do.body.1: float = 0.54369, +; UNIF4210: - do.body.2: float = 0.2956, +; UNIF4210: - do.body.3: float = 0.16071, +; FAST4210: - do.body: float = 1.0, +; FAST4210: - do.body.1: float = 0.57143, +; FAST4210: - do.body.2: float = 0.28571, +; FAST4210: - do.body.3: float = 0.14286, +; +; UR4210: call void @f +; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 +; UR4210: call void @f +; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.2, +; UNIF4210-SAME: !prof !0 +; FAST4210-SAME: !prof !1 +; UR4210: call void @f +; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.3, +; UNIF4210-SAME: !prof !0 +; FAST4210-SAME: !prof !1 +; UR4210: call void @f +; UR4210: br label %do.end +; UNIF4210: !0 = !{!"branch_weights", i32 979920896, i32 1167562752} +; FAST4210: !0 = !{!"branch_weights", i32 920350135, i32 1227133513} +; FAST4210: !1 = !{!"branch_weights", i32 1, i32 1} ; ; Now use a constant iteration count so that all non-final unrolled ; iterations' latches unconditionally continue. ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4240 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4240 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4240 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4240 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4240 ; @@ -781,6 +864,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG424x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR424x +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR424x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR424x ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR424x ; @@ -807,6 +892,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4110 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4110 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4110 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4110 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4110 ; @@ -832,6 +919,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG4140 +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4140 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4140 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4140 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4140 ; @@ -855,6 +944,8 @@ ; ; RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll ; RUN: %{bf-fc} ORIG414x +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR414x +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR414x ; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR414x ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR414x ; @@ -880,14 +971,16 @@ ; - Unroll count of >=5 should always produce complete unrolling. ; - That produces <=4 unrolled iteration latches. When at least 3 remain ; conditional, the implementation cannot compute uniform weights using a -; simple formula. +; simple formula, so -unroll-uniform-weights matters. ; ; Original loop body frequency is 5 (loop weight 4). ; ; RUN: sed -e s/@MAX@/5/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG5510 -; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510 -; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5510,UNIF5510 +; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5510,UNIF5510 +; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510,FAST5510 +; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510,FAST5510 ; ; The sum of the new do.body* is the old do.body. ; ORIG5510: - do.body: float = 5.0, @@ -899,55 +992,75 @@ ; ; All continue probabilities are approximately 1, but somehow there is less ; precision in the calculation of the last case. -; UR5510: call void @f -; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 -; UR5510: call void @f -; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0 -; UR5510: call void @f -; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0 -; UR5510: call void @f -; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !1 -; UR5510: call void @f -; UR5510: br label %do.end -; UR5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648} -; UR5510: !1 = !{!"branch_weights", i32 10, i32 2147483638} +; UR5510: call void @f +; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 +; UR5510: call void @f +; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0 +; UR5510: call void @f +; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0 +; UR5510: call void @f +; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.4, +; UNIF5510-SAME: !prof !0 +; FAST5510-SAME: !prof !1 +; UR5510: call void @f +; UR5510: br label %do.end +; UNIF5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648} +; FAST5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648} +; FAST5510: !1 = !{!"branch_weights", i32 10, i32 2147483638} ; ; Original loop body frequency is 4 (loop weight 3). ; ; RUN: sed -e s/@MAX@/5/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG5410 -; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410 -; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5410,UNIF5410 +; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5410,UNIF5410 +; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410,FAST5410 +; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410,FAST5410 ; ; The sum of the new do.body* is always the old do.body. ; ORIG5410: - do.body: float = 4.0, -; UR5410: - do.body: float = 1.0, -; UR5410: - do.body.1: float = 1.0, -; UR5410: - do.body.2: float = 0.86486, -; UR5410: - do.body.3: float = 0.64865, -; UR5410: - do.body.4: float = 0.48649, -; -; This is our first case where the implementation must adjust multiple -; probabilities to something other than the original latch probability but -; does not just set all probabilities to the limit of 1 or 0. -; UR5410: call void @f -; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 -; UR5410: call void @f -; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1 -; UR5410: call void @f -; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !2 -; UR5410: call void @f -; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !2 -; UR5410: call void @f -; UR5410: br label %do.end -; UR5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648} -; UR5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155} -; UR5410: !2 = !{!"branch_weights", i32 1, i32 3} +; UNIF5410: - do.body: float = 1.0, +; UNIF5410: - do.body.1: float = 0.88818, +; UNIF5410: - do.body.2: float = 0.78886, +; UNIF5410: - do.body.3: float = 0.70065, +; UNIF5410: - do.body.4: float = 0.62231, +; FAST5410: - do.body: float = 1.0, +; FAST5410: - do.body.1: float = 1.0, +; FAST5410: - do.body.2: float = 0.86486, +; FAST5410: - do.body.3: float = 0.64865, +; FAST5410: - do.body.4: float = 0.48649, +; +; This is our first case where, when not using -unroll-uniform-weights, the +; implementation must adjust multiple probabilities to something other than +; the original latch probability but does not just set all probabilities to +; the limit of 1 or 0. +; UR5410: call void @f +; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0 +; UR5410: call void @f +; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.2, +; UNIF5410-SAME: !prof !0 +; FAST5410-SAME: !prof !1 +; UR5410: call void @f +; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.3, +; UNIF5410-SAME: !prof !0 +; FAST5410-SAME: !prof !2 +; UR5410: call void @f +; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.4, +; UNIF5410-SAME: !prof !0 +; FAST5410-SAME: !prof !2 +; UR5410: call void @f +; UR5410: br label %do.end +; UNIF5410: !0 = !{!"branch_weights", i32 240132096, i32 1907351552} +; FAST5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648} +; FAST5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155} +; FAST5410: !2 = !{!"branch_weights", i32 1, i32 3} ; ; Original loop body frequency is 1 (loop weight 0). ; ; RUN: sed -e s/@MAX@/5/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll ; RUN: %{bf-fc} ORIG5110 +; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5110 +; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5110 ; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5110 ; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5110 ; diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll index a87c9390ee780..84f92e519a1c3 100644 --- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll +++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll @@ -67,6 +67,7 @@ ; their original probabilities are not contradicted. That is, the original loop ; latch's branch weights remain on all unrolled iterations' latches. ; +; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} MULT3 ; RUN: %{ur-bf} -unroll-count=3 | %{fc} MULT3 ; ; Sums to approximately the original loop body frequency, 10. @@ -91,7 +92,9 @@ ; ; -unroll-count=2, so there is 1 remaining conditional latch, so the ; implementation can compute uniform weights by solving a linear equation. +; Thus, -unroll-uniform-weights has no effect. ; +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} MULT2 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} MULT2 ; ; Multiply by 2 to get the original loop body frequency, 10. @@ -111,7 +114,9 @@ ; ; -unroll-count=4, so there are 2 remaining conditional latches, so the ; implementation can compute uniform weights using the quadratic formula. +; Thus, -unroll-uniform-weights has no effect. ; +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} MULT4 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} MULT4 ; ; Multiply by 2 and sum to get the original loop body frequency, 10. @@ -136,14 +141,20 @@ ; ; -unroll-count=6, so there are 3 remaining conditional latches, the lowest ; number where the implementation cannot compute uniform weights using a -; simple formula. +; simple formula. Thus, this is our first case where -unroll-uniform-weights +; matters. ; -; RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6 +; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} MULT6,MUNIF6 +; RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6,MFAST6 ; -; Multiply by 2 and sum to get the original loop body frequency, 10. -; MULT6: - do.body: float = 2.1956, -; MULT6: - do.body.2: float = 1.476, -; MULT6: - do.body.4: float = 1.3284, +; For either MUNIF or MFAST, multiply by 2 and sum to get the original loop +; body frequency, 10. +; MUNIF6: - do.body: float = 2.0492, +; MUNIF6: - do.body.2: float = 1.6393, +; MUNIF6: - do.body.4: float = 1.3115, +; MFAST6: - do.body: float = 2.1956, +; MFAST6: - do.body.2: float = 1.476, +; MFAST6: - do.body.4: float = 1.3284, ; ; MULT6: call void @f ; MULT6-NOT: br @@ -152,19 +163,31 @@ ; MULT6: call void @f ; MULT6-NOT: br ; MULT6: call void @f -; MULT6: br i1 %{{.*}}, label %do.body.4, label %do.end, !prof !1 +; MULT6: br i1 %{{.*}}, label %do.body.4, label %do.end, +; MUNIF6-SAME: !prof !0 +; MFAST6-SAME: !prof !1 ; MULT6: call void @f ; MULT6-NOT: br ; MULT6: call void @f -; MULT6: br i1 %{{.*}}, label %do.body, label %do.end, !prof !1, !llvm.loop !2 -; -; There are 3 conditional latches remaining, so it adjusts the first and +; MULT6: br i1 %{{.*}}, label %do.body, label %do.end, +; MUNIF6-SAME: !prof !0, !llvm.loop !1 +; MFAST6-SAME: !prof !1, !llvm.loop !2 +; +; MUNIF6 is like applying -unroll-count=3 to MULT2 without converting any +; additional conditional latches to unconditional, so (approximately) +; MULT2's branch weights make sense. +; MUNIF6: !0 = !{!"branch_weights", i32 1717986944, i32 429496704} +; MUNIF6: !1 = distinct !{!1, !2, !3} +; MUNIF6: !2 = !{!"llvm.loop.estimated_trip_count", i32 2} +; MUNIF6: !3 = !{!"llvm.loop.unroll.disable"} +; +; There are 3 conditional latches remaining, so MFAST6 adjusts the first and ; leaves the second two with the original loop's branch weights. -; MULT6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162} -; MULT6: !1 = !{!"branch_weights", i32 9, i32 1} -; MULT6: !2 = distinct !{!2, !3, !4} -; MULT6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2} -; MULT6: !4 = !{!"llvm.loop.unroll.disable"} +; MFAST6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162} +; MFAST6: !1 = !{!"branch_weights", i32 9, i32 1} +; MFAST6: !2 = distinct !{!2, !3, !4} +; MFAST6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2} +; MFAST6: !4 = !{!"llvm.loop.unroll.disable"} ; ------------------------------------------------------------------------------ ; Check case when the original loop's number of iterations is a run-time @@ -185,6 +208,7 @@ ; implementation tries to compute uniform weights by solving a linear equation ; but ultimately sets the latch's probability to zero. ; +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} LOW2 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} LOW2 ; ; Multiply by 2, but the result is greater than the original loop body @@ -205,6 +229,7 @@ ; implementation tries to compute uniform weights using the quadratic formula ; but ultimately sets both latches' probabilities to zero. ; +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} LOW4 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} LOW4 ; ; Multiply by 2 and sum, but the result is greater than the original loop body @@ -228,12 +253,13 @@ ; ; -unroll-count=6, so there are 3 remaining conditional latches. The ; implementation cannot compute uniform weights using a simple formula, and -; ultimately it must set all those latches' probabilities to zero. The -; implementation will face a new stumbling block starting at the second latch: -; reaching the remaining iterations already has a zero probability due to the -; zero probability set at the first latch, so the required probability could -; accidentally be computed as negative infinity. +; ultimately it must set all those latches' probabilities to zero. If not +; -unroll-uniform-weights, then the implementation will face a new stumbling +; block starting at the second latch: reaching the remaining iterations already +; has a zero probability due to the zero probability set at the first latch, so +; the required probability could accidentally be computed as negative infinity. ; +; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} LOW6 ; RUN: %{ur-bf} -unroll-count=6 | %{fc} LOW6 ; ; Multiply by 2 and sum, but the result is greater than the original loop body @@ -269,7 +295,7 @@ ; Because we test only partial unrolling, there is always exactly one unrolled ; iteration that can possibly exit, so only its latch can remain conditional. ; Because there is only one, its branch weights can be computed with a simple -; formula. +; formula, and -unroll-uniform-weights does not matter. ; ; Check the original loop body frequency. ; @@ -278,6 +304,7 @@ ; ; Check when the unrolled loop's backedge remains conditional. ; +; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} CONST2 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} CONST2 ; ; Multiply by 2 to get the original loop body frequency, 10. @@ -296,6 +323,7 @@ ; ; Check when the unrolled loop's backedge unconditionally continues. ; +; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} CONST4 ; RUN: %{ur-bf} -unroll-count=4 | %{fc} CONST4 ; ; Multiply by 2 and sum to get the original loop body frequency, 10. _______________________________________________ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
