> -----Original Message-----
> From: Kumar, Venkataramanan <[email protected]>
> Sent: Wednesday, May 20, 2026 1:52 AM
> To: Hongtao Liu <[email protected]>; Cui, Lili <[email protected]>
> Cc: [email protected]; Liu, Hongtao <[email protected]>;
> [email protected]; Jan Hubicka <[email protected]>
> Subject: RE: [PATCH] x86: Increase generic tune branch misprediction cost
> 
> AMD General
> 
> Hi,
> 
> > -----Original Message-----
> > From: Kumar, Venkataramanan
> > Sent: Tuesday, May 19, 2026 6:26 PM
> > To: 'Hongtao Liu' <[email protected]>; Lili Cui <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > Jan Hubicka <[email protected]>
> > Subject: RE: [PATCH] x86: Increase generic tune branch misprediction
> > cost
> >
> > Hi
> >
> > > -----Original Message-----
> > > From: Hongtao Liu <[email protected]>
> > > Sent: Tuesday, May 19, 2026 2:04 PM
> > > To: Lili Cui <[email protected]>
> > > Cc: [email protected]; [email protected];
> > > [email protected]; Jan Hubicka <[email protected]>
> > > Subject: Re: [PATCH] x86: Increase generic tune branch misprediction
> > > cost
> > >
> > > Caution: This message originated from an External Source. Use proper
> > > caution when opening attachments, clicking links, or responding.
> > >
> > >
> > > On Fri, May 15, 2026 at 3:02 PM Lili Cui <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > This patch increases the branch misprediction scale for generic
> > > > tuning to
> > > better reflect the cost on modern CPUs with deeper pipelines.
> > > >
> > > > Bootstrapped and regression tested on x86_64-pc-linux-gnu. OK for
> trunk?
> > >
> > > Any comments from AMD folks?
> >
> > Yes, planning to run few benchmarks, will get back to you.
> >
> > Regards,
> > Venkat.
> >
> > >
> > > >
> > > > Thanks,
> > > > Lili.
> > > >
> > > >
> > > > Increase the branch misprediction scale for generic tuning from
> > > > COSTS_N_INSNS (2) to COSTS_N_INSNS (2) + 3.
> > > >
> > > > Modern CPUs have deeper pipelines, making branch mispredictions
> > > > more expensive. Increasing this cost encourages if-conversion,
> > > > avoiding pipeline stalls from mispredicted branches.
> > > >
> > > > This improves 544.nab_r (-O2) by 12.7% on GNR and 12.1% on Znver5
> > > > with single-copy.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * config/i386/x86-tune-costs.h (generic_cost): Increase branch
> > > >         mispredict scale from COSTS_N_INSNS (2) to COSTS_N_INSNS (2) +
> 3.
> > > > ---
> > > >  gcc/config/i386/x86-tune-costs.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/gcc/config/i386/x86-tune-costs.h
> > > > b/gcc/config/i386/x86-tune-costs.h
> > > > index 7819fdf7c02..0c687a11a74 100644
> > > > --- a/gcc/config/i386/x86-tune-costs.h
> > > > +++ b/gcc/config/i386/x86-tune-costs.h
> > > > @@ -4274,7 +4274,7 @@ struct processor_costs generic_cost = {
> > > >    "16",                                        /* Func alignment.  */
> > > >    4,                                   /* Small unroll limit.  */
> > > >    2,                                   /* Small unroll factor.  */
> > > > -  COSTS_N_INSNS (2),                   /* Branch mispredict scale.  */
> > > > +  COSTS_N_INSNS (2) + 3,               /* Branch mispredict scale.  */
> > > >  };
> 
> My reading of the generic tune change: br_mispredict_scale goes from
> COSTS_N_INSNS(2) (8) to COSTS_N_INSNS(2) + 3 (11), and with default
> branch_cost == 3 for generic, ix86_max_noce_ifcvt_seq_cost() for
> unpredictable edges rises from 24 to 33, so RTL if-conversion should accept
> costlier cmov/predicated sequences instead of compare-and-branch.
> 
> Could you clarify how +3 was chosen? Was it tuned empirically (e.g. SPEC /
> 544.nab_r on GNR and Znver5), or tied to a model of mis predict penalty on
> recent cores (e.g. GNR)?
> 
Hi Venkat,

The +3 value was chosen empirically through testing on 544.nab_r, which showed 
~12% improvement on both GNR and Znver5. I also tested on SPEC CPU2026 and saw 
slight overall improvements, but they were too small to measure reliably.
I tested larger values including 40 (matching the 
max-rtl-if-conversion-unpredictable-cost default for other architectures), but 
saw no additional gain on both GNR and Znver5. Other SPEC CPU2017/CPU2026 
benchmarks were not sensitive to this parameter.

Therefore, +3 is the minimal adjustment needed to capture the full benefit - a 
conservative choice for generic tuning. The original x86 threshold of 24 
appears to be based on older microarchitectures, while modern CPUs have deeper 
pipelines with higher misprediction penalties. The current value may still be 
conservative, but we don't have better inputs to tune it further at this time.

Regards,
Lili.

> Regards,
> Venkat.
> 
> > > >
> > > >  /* core_cost should produce code tuned for Core familly of CPUs.
> > > > */
> > > > --
> > > > 2.34.1
> > > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao

Reply via email to