Re: [PATCH (v2) GCC17-stage1] match.pd: Relax single_use for fold-to-zero comparisons

Philipp Tomsich Sun, 15 Mar 2026 07:18:17 -0700

Richard,

> One reason for the single-use check is that we want to avoid the
> transform for a loop exit check where the result prevents coalescing
> of the in-loop IV before/after update and thus requires a non-empty
> latch block.  IIRC there's code that tries to fixup during out-of-SSA,
> but please double-check this actually works


I've investigated this.  insert_backedge_copies() in tree-outof-ssa.cc
(lines 1289-1316) handles exactly this pattern: when a PHI result is
used in an EQ/NE condition and the PHI arg is defined as
result +/- INTEGER_CST, it adjusts the condition to use the
post-increment value, restoring coalescability.

Empirically, I built the compiler with and without the patch and
compared assembly on three targets for several IV loop patterns
designed to trigger the conflict (tight loops with unknown start/bound,
with and without additional IV uses).  No extra copy appeared in any
inner loop.

Results for the hot inner loops:

  tight_unknown (tight loop, unknown IV start):
    x86-64:  patched slightly better (fewer callee-saves, testl vs cmpl $1)
    AArch64: patched better (4 vs 5 insns -- cbnz fuses cmp+branch;
             1 vs 2 callee-saves)
    RISC-V:  patched slightly worse (4 vs 3 insns -- see below)

  tight_use_after (IV + array access in loop body):
    x86-64:  patched slightly worse (8 vs 7 insns -- different IVOPTS
             addressing)
    AArch64: neutral (8 vs 8 insns)
    RISC-V:  neutral (6 vs 6 insns)

The differences are not from coalescing failures but from downstream
pass decisions.  The one RISC-V regression in tight_unknown is a
static profile estimation artefact: the compare-against-zero heuristic
assigns 74% probability to `i == 0` (vs 20% for the baseline's
`i+1 == 1`), causing the cold sink() call path to become the
fall-through.  This does not occur with PGO.

Note that this relaxation only fires when the folded constant is
zero, i.e., (X + C) == C -> X == 0.  In a loop IV context this means
the check is true only when the IV is zero (typically the first
iteration), which compilers tend to peel or constant-fold anyway.
The primary beneficiary is non-loop code like the motivating case
(++*a == 1 -> *a == 0).

Is the patch OK as-is, or would you prefer an additional guard
(e.g., checking that @3 is not defined inside a loop)?

Philipp

Re: [PATCH (v2) GCC17-stage1] match.pd: Relax single_use for fold-to-zero comparisons

Reply via email to