On Sun, Mar 15, 2026 at 3:18 PM Philipp Tomsich
<[email protected]> wrote:
>
> Richard,
>
> > One reason for the single-use check is that we want to avoid the
> > transform for a loop exit check where the result prevents coalescing
> > of the in-loop IV before/after update and thus requires a non-empty
> > latch block.  IIRC there's code that tries to fixup during out-of-SSA,
> > but please double-check this actually works
>
> I've investigated this.  insert_backedge_copies() in tree-outof-ssa.cc
> (lines 1289-1316) handles exactly this pattern: when a PHI result is
> used in an EQ/NE condition and the PHI arg is defined as
> result +/- INTEGER_CST, it adjusts the condition to use the
> post-increment value, restoring coalescability.
>
> Empirically, I built the compiler with and without the patch and
> compared assembly on three targets for several IV loop patterns
> designed to trigger the conflict (tight loops with unknown start/bound,
> with and without additional IV uses).  No extra copy appeared in any
> inner loop.
>
> Results for the hot inner loops:
>
>   tight_unknown (tight loop, unknown IV start):
>     x86-64:  patched slightly better (fewer callee-saves, testl vs cmpl $1)
>     AArch64: patched better (4 vs 5 insns -- cbnz fuses cmp+branch;
>              1 vs 2 callee-saves)
>     RISC-V:  patched slightly worse (4 vs 3 insns -- see below)
>
>   tight_use_after (IV + array access in loop body):
>     x86-64:  patched slightly worse (8 vs 7 insns -- different IVOPTS
>              addressing)
>     AArch64: neutral (8 vs 8 insns)
>     RISC-V:  neutral (6 vs 6 insns)
>
> The differences are not from coalescing failures but from downstream
> pass decisions.  The one RISC-V regression in tight_unknown is a
> static profile estimation artefact: the compare-against-zero heuristic
> assigns 74% probability to `i == 0` (vs 20% for the baseline's
> `i+1 == 1`), causing the cold sink() call path to become the
> fall-through.  This does not occur with PGO.
>
> Note that this relaxation only fires when the folded constant is
> zero, i.e., (X + C) == C -> X == 0.  In a loop IV context this means
> the check is true only when the IV is zero (typically the first
> iteration), which compilers tend to peel or constant-fold anyway.
> The primary beneficiary is non-loop code like the motivating case
> (++*a == 1 -> *a == 0).
>
> Is the patch OK as-is, or would you prefer an additional guard
> (e.g., checking that @3 is not defined inside a loop)?

Thanks for double-checking, the patch is OK for stage1 as-is, we
may not use this kind of guards in match.pd.

Richard.

>
> Philipp

Reply via email to