On Sun, Mar 15, 2026 at 3:18 PM Philipp Tomsich <[email protected]> wrote: > > Richard, > > > One reason for the single-use check is that we want to avoid the > > transform for a loop exit check where the result prevents coalescing > > of the in-loop IV before/after update and thus requires a non-empty > > latch block. IIRC there's code that tries to fixup during out-of-SSA, > > but please double-check this actually works > > I've investigated this. insert_backedge_copies() in tree-outof-ssa.cc > (lines 1289-1316) handles exactly this pattern: when a PHI result is > used in an EQ/NE condition and the PHI arg is defined as > result +/- INTEGER_CST, it adjusts the condition to use the > post-increment value, restoring coalescability. > > Empirically, I built the compiler with and without the patch and > compared assembly on three targets for several IV loop patterns > designed to trigger the conflict (tight loops with unknown start/bound, > with and without additional IV uses). No extra copy appeared in any > inner loop. > > Results for the hot inner loops: > > tight_unknown (tight loop, unknown IV start): > x86-64: patched slightly better (fewer callee-saves, testl vs cmpl $1) > AArch64: patched better (4 vs 5 insns -- cbnz fuses cmp+branch; > 1 vs 2 callee-saves) > RISC-V: patched slightly worse (4 vs 3 insns -- see below) > > tight_use_after (IV + array access in loop body): > x86-64: patched slightly worse (8 vs 7 insns -- different IVOPTS > addressing) > AArch64: neutral (8 vs 8 insns) > RISC-V: neutral (6 vs 6 insns) > > The differences are not from coalescing failures but from downstream > pass decisions. The one RISC-V regression in tight_unknown is a > static profile estimation artefact: the compare-against-zero heuristic > assigns 74% probability to `i == 0` (vs 20% for the baseline's > `i+1 == 1`), causing the cold sink() call path to become the > fall-through. This does not occur with PGO. > > Note that this relaxation only fires when the folded constant is > zero, i.e., (X + C) == C -> X == 0. In a loop IV context this means > the check is true only when the IV is zero (typically the first > iteration), which compilers tend to peel or constant-fold anyway. > The primary beneficiary is non-loop code like the motivating case > (++*a == 1 -> *a == 0). > > Is the patch OK as-is, or would you prefer an additional guard > (e.g., checking that @3 is not defined inside a loop)?
Thanks for double-checking, the patch is OK for stage1 as-is, we may not use this kind of guards in match.pd. Richard. > > Philipp
