> 
> I have no issues on my end, but I hope Hubicka can take a look at the
> znver6 tune part and approve this patch.
> > +  /* Zen5 can execute:
> > +      - integer ops: 6 per cycle, at most 3 multiplications.
> > +       latency 1 for additions, 3 for multiplications (pipelined)
> > +
> > +       Setting width of 9 for multiplication is probably excessive
> > +       for register pressure.
> > +      - fp ops: 2 additions per cycle, latency 2-3
> > +               2 multiplicaitons per cycle, latency 3
> > +      - vector intger ops: 4 additions, latency 1
> > +                          2 multiplications, latency 4
> > +       We increase width to 6 for multiplications
> > +       in ix86_reassociation_width.  */

I know that this is just cut&paste from znver5 at this point, but I
would at least drop the comment (alternative would be to just use znver5
cost table until later stage when the values gets updated for real
hardware.
> > diff --git a/gcc/config/i386/x86-tune-sched.cc 
> > b/gcc/config/i386/x86-tune-sched.cc
> > index 11b33382ecb..772f7af6541 100644
> > --- a/gcc/config/i386/x86-tune-sched.cc
> > +++ b/gcc/config/i386/x86-tune-sched.cc
> > @@ -113,6 +113,10 @@ ix86_issue_rate (void)
> >      case PROCESSOR_NOVALAKE:
> >        return 8;
> >
> > +    /* Issue rate we are changing to 8 considering the Dispatch width */
> > +    case PROCESSOR_ZNVER6:
> > +      return 8;

I think you are still using znver5 scheduler description? If so then
scheduler will never be able to fill in all 8 instructions since the
bottleneck modelled is the decoder. So this would just (noticeably)
increase compile time, so please keep znver5 setting with comment that
it is not technically correct.

With these changes the tune bits are OK.
Honza

Reply via email to