AMD General Hi,
> -----Original Message----- > From: Kumar, Venkataramanan > Sent: Tuesday, May 19, 2026 6:26 PM > To: 'Hongtao Liu' <[email protected]>; Lili Cui <[email protected]> > Cc: [email protected]; [email protected]; [email protected]; Jan > Hubicka <[email protected]> > Subject: RE: [PATCH] x86: Increase generic tune branch misprediction cost > > Hi > > > -----Original Message----- > > From: Hongtao Liu <[email protected]> > > Sent: Tuesday, May 19, 2026 2:04 PM > > To: Lili Cui <[email protected]> > > Cc: [email protected]; [email protected]; [email protected]; > > Jan Hubicka <[email protected]> > > Subject: Re: [PATCH] x86: Increase generic tune branch misprediction > > cost > > > > Caution: This message originated from an External Source. Use proper > > caution when opening attachments, clicking links, or responding. > > > > > > On Fri, May 15, 2026 at 3:02 PM Lili Cui <[email protected]> wrote: > > > > > > Hi, > > > > > > This patch increases the branch misprediction scale for generic > > > tuning to > > better reflect the cost on modern CPUs with deeper pipelines. > > > > > > Bootstrapped and regression tested on x86_64-pc-linux-gnu. OK for trunk? > > > > Any comments from AMD folks? > > Yes, planning to run few benchmarks, will get back to you. > > Regards, > Venkat. > > > > > > > > > Thanks, > > > Lili. > > > > > > > > > Increase the branch misprediction scale for generic tuning from > > > COSTS_N_INSNS (2) to COSTS_N_INSNS (2) + 3. > > > > > > Modern CPUs have deeper pipelines, making branch mispredictions more > > > expensive. Increasing this cost encourages if-conversion, avoiding > > > pipeline stalls from mispredicted branches. > > > > > > This improves 544.nab_r (-O2) by 12.7% on GNR and 12.1% on Znver5 > > > with single-copy. > > > > > > gcc/ChangeLog: > > > > > > * config/i386/x86-tune-costs.h (generic_cost): Increase branch > > > mispredict scale from COSTS_N_INSNS (2) to COSTS_N_INSNS (2) + 3. > > > --- > > > gcc/config/i386/x86-tune-costs.h | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/gcc/config/i386/x86-tune-costs.h > > > b/gcc/config/i386/x86-tune-costs.h > > > index 7819fdf7c02..0c687a11a74 100644 > > > --- a/gcc/config/i386/x86-tune-costs.h > > > +++ b/gcc/config/i386/x86-tune-costs.h > > > @@ -4274,7 +4274,7 @@ struct processor_costs generic_cost = { > > > "16", /* Func alignment. */ > > > 4, /* Small unroll limit. */ > > > 2, /* Small unroll factor. */ > > > - COSTS_N_INSNS (2), /* Branch mispredict scale. */ > > > + COSTS_N_INSNS (2) + 3, /* Branch mispredict scale. */ > > > }; My reading of the generic tune change: br_mispredict_scale goes from COSTS_N_INSNS(2) (8) to COSTS_N_INSNS(2) + 3 (11), and with default branch_cost == 3 for generic, ix86_max_noce_ifcvt_seq_cost() for unpredictable edges rises from 24 to 33, so RTL if-conversion should accept costlier cmov/predicated sequences instead of compare-and-branch. Could you clarify how +3 was chosen? Was it tuned empirically (e.g. SPEC / 544.nab_r on GNR and Znver5), or tied to a model of mis predict penalty on recent cores (e.g. GNR)? Regards, Venkat. > > > > > > /* core_cost should produce code tuned for Core familly of CPUs. > > > */ > > > -- > > > 2.34.1 > > > > > > > > > -- > > BR, > > Hongtao
