https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125809

            Bug ID: 125809
           Summary: [16 Regression] ipa-cp over-clones under a guessed
                    profile: devirtualization_time_bonus
                    frequency-weighting (ad3fb999a1b) inflates the bonus,
                    ~8% slower cc1 build (SPEC 721.gcc_r)
           Product: gcc
           Version: 16.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ptomsich at gcc dot gnu.org
  Target Milestone: ---

r16 commit ad3fb999a1b56893f0f6296a52fe2af550763fee "Improve ipa-cp
devirtualization costing" changed devirtualization_time_bonus to weight each
devirtualizable indirect call's saving by ie->combined_sreal_frequency().

That is correct with a real profile, but under a guessed/static profile (e.g.
-O2/LTO without PGO) the frequency of a hot, loop-nested function is
over-estimated. This pushes it past the per-value cloning threshold, creating
extra context clones.

Concretely it regresses SPEC CPU 2026 721.gcc_r (-Ofast -flto -mcpu=ampere1, no
PGO) by ~8% slower (319.2s -> 347.7 s).
tree-ssa-sccvn.cc:process_bb (called from do_rpo_vn) is split into two context
clones (iterate=0/iterate=1) instead of one.

Self-contained reproducer (gcc -O2 -fdump-ipa-cp-details t.c):


int sink;
extern int (*gp) (int);
static int cb (int x) {
  int r = x;
  r = r*3+1; r ^= r>>2; r += r<<3; r -= r>>1;
  r = r*5+7; r ^= r>>4; r += r<<2; r -= r>>3;
  r = r*9+2; r ^= r>>5; return r;
}
static int __attribute__((noinline))
worker (int (*fn)(int), int *a, int n, int m) {
  int s = 0;
  for (int j = 0; j < m; j++)
    for (int i = 0; i < n; i++)
      s += fn (a[i]);
  return s;
}
void caller0 (int *a, int n, int m) { sink += worker (cb, a, n, m); }
void caller1 (int *a, int n, int m) { sink += worker (gp, a, n, m); }



Before ad3fb999a1b (and with our proposed fix) worker is not specialized: the
one hot indirect call is not worth a clone under a guessed profile. After
ad3fb999a1b it is cloned (Creating a specialized node of worker):
good_cloning_opportunity_p evaluation jumps from 153 to 4153 (threshold 500)
purely from the frequency factor.

Proposed fix (validated: process_bb back to one clone ... which recovers the
~8%): frequency-weight the bonus only when ie->count.reliable_p(); otherwise
use the unweighted saving (the pre-ad3fb behaviour). This keeps the improvement
for PGO/AFDO and avoids the over-cloning under guessed profiles.

We have a prototype patch that I can add, if this sounds like a good direction
to resolve this.

Reply via email to