Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

Hongtao Liu Sun, 26 May 2024 18:21:58 -0700

On Mon, May 20, 2024 at 11:15 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen <haochen.ji...@intel.com> 
> wrote:
> >
> > Also cc Honza and Richard since we touched generic tune.
> >
> > Thx,
> > Haochen
> >
> > > -----Original Message-----
> > > From: Haochen Jiang <haochen.ji...@intel.com>
> > > Sent: Wednesday, May 15, 2024 11:04 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Liu, Hongtao <hongtao....@intel.com>; ubiz...@gmail.com
> > > Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue
> > >
> > > Hi all,
> > >
> > > Recently, we have encountered several random performance regressions in
> > > benchmarks commit to commit. It is caused by cross cacheline issue for 
> > > tight
> > > loops.
> > >
> > > We are trying to solve the issue by two patches. One is adjusting the loop
> > > alignment for generic tune, the other is aligning tight and hot loops more
> > > aggressively.
> > >
> > > For SPECINT, we get a 0.85% improvement overall in rates, under option
> > > -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids.
> > >
> > > BenchMarks      EMR Rates
> > > 500.perlbench_r -1.21%
> > > 502.gcc_r       0.78%
> > > 505.mcf_r       0.00%
> > > 520.omnetpp_r   0.41%
> > > 523.xalancbmk_r 1.33%
> > > 525.x264_r      2.83%
> > > 531.deepsjeng_r 1.11%
> > > 541.leela_r     0.00%
> > > 548.exchange2_r 2.36%
> > > 557.xz_r        0.98%
> > > Geomean-int     0.85%
> > >
> > > Side effect is that we get a 1.40% increase in codesize.
> > >
> > > BenchMarks      EMR Codesize
> > > 500.perlbench_r 0.70%
> > > 502.gcc_r       0.67%
> > > 505.mcf_r       3.26%
> > > 520.omnetpp_r   0.31%
> > > 523.xalancbmk_r 1.15%
> > > 525.x264_r      1.11%
> > > 531.deepsjeng_r 1.40%
> > > 541.leela_r     1.31%
> > > 548.exchange2_r 3.06%
> > > 557.xz_r        1.04%
> > > Geomean-int     1.40%
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for this if there's no objection in 48 hours.
> > >
> > > After we committed into trunk for a month, if there isn't any unexpected
> > > happen. We planned to backport it to GCC14.2.
> > >
> > > Thx,
> > > Haochen
> > >
> > > Haochen Jiang (1):
> > >   Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
> For this one, current znver{1,2,3,4,5}_cost already set loop align as
> 16, so I think it should be fine set it to generic_cost.
> > >
> > > liuhongt (1):
> > >   Align tight&hot loop without considering max skipping bytes.
> For this one, although we have seen similar growth on AMD's
> processors, it's still nice to have someone from AMD to look at this
> to see if it's what they need.
> > >
> > >  gcc/config/i386/i386.cc          | 148 ++++++++++++++++++++++++++++++-
> > >  gcc/config/i386/i386.md          |  10 ++-
> > >  gcc/config/i386/x86-tune-costs.h |   2 +-
> > >  3 files changed, 154 insertions(+), 6 deletions(-)
> > >
> > > --
> > > 2.31.1
> >
>
>
> --
> BR,
> Hongtao




-- 
BR,
Hongtao

Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

Reply via email to