AMD General

Hi,

> -----Original Message-----
> From: Kumar, Venkataramanan
> Sent: Tuesday, May 19, 2026 6:26 PM
> To: 'Hongtao Liu' <[email protected]>; Lili Cui <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; Jan
> Hubicka <[email protected]>
> Subject: RE: [PATCH] x86: Increase generic tune branch misprediction cost
>
> Hi
>
> > -----Original Message-----
> > From: Hongtao Liu <[email protected]>
> > Sent: Tuesday, May 19, 2026 2:04 PM
> > To: Lili Cui <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > Jan Hubicka <[email protected]>
> > Subject: Re: [PATCH] x86: Increase generic tune branch misprediction
> > cost
> >
> > Caution: This message originated from an External Source. Use proper
> > caution when opening attachments, clicking links, or responding.
> >
> >
> > On Fri, May 15, 2026 at 3:02 PM Lili Cui <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > This patch increases the branch misprediction scale for generic
> > > tuning to
> > better reflect the cost on modern CPUs with deeper pipelines.
> > >
> > > Bootstrapped and regression tested on x86_64-pc-linux-gnu. OK for trunk?
> >
> > Any comments from AMD folks?
>
> Yes, planning to run few benchmarks, will get back to you.
>
> Regards,
> Venkat.
>
> >
> > >
> > > Thanks,
> > > Lili.
> > >
> > >
> > > Increase the branch misprediction scale for generic tuning from
> > > COSTS_N_INSNS (2) to COSTS_N_INSNS (2) + 3.
> > >
> > > Modern CPUs have deeper pipelines, making branch mispredictions more
> > > expensive. Increasing this cost encourages if-conversion, avoiding
> > > pipeline stalls from mispredicted branches.
> > >
> > > This improves 544.nab_r (-O2) by 12.7% on GNR and 12.1% on Znver5
> > > with single-copy.
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/x86-tune-costs.h (generic_cost): Increase branch
> > >         mispredict scale from COSTS_N_INSNS (2) to COSTS_N_INSNS (2) + 3.
> > > ---
> > >  gcc/config/i386/x86-tune-costs.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/i386/x86-tune-costs.h
> > > b/gcc/config/i386/x86-tune-costs.h
> > > index 7819fdf7c02..0c687a11a74 100644
> > > --- a/gcc/config/i386/x86-tune-costs.h
> > > +++ b/gcc/config/i386/x86-tune-costs.h
> > > @@ -4274,7 +4274,7 @@ struct processor_costs generic_cost = {
> > >    "16",                                        /* Func alignment.  */
> > >    4,                                   /* Small unroll limit.  */
> > >    2,                                   /* Small unroll factor.  */
> > > -  COSTS_N_INSNS (2),                   /* Branch mispredict scale.  */
> > > +  COSTS_N_INSNS (2) + 3,               /* Branch mispredict scale.  */
> > >  };

My reading of the generic tune change: br_mispredict_scale goes from 
COSTS_N_INSNS(2) (8) to COSTS_N_INSNS(2) + 3 (11), and with default branch_cost 
== 3 for generic, ix86_max_noce_ifcvt_seq_cost() for unpredictable edges rises 
from 24 to 33, so RTL if-conversion should accept costlier cmov/predicated 
sequences instead of compare-and-branch.

Could you clarify how +3 was chosen? Was it tuned empirically (e.g. SPEC / 
544.nab_r on GNR and Znver5), or tied to a model of mis predict penalty on 
recent cores (e.g. GNR)?

Regards,
Venkat.

> > >
> > >  /* core_cost should produce code tuned for Core familly of CPUs.
> > > */
> > > --
> > > 2.34.1
> > >
> >
> >
> > --
> > BR,
> > Hongtao

Reply via email to