On Mon, May 20, 2024 at 11:15 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen <haochen.ji...@intel.com> > wrote: > > > > Also cc Honza and Richard since we touched generic tune. > > > > Thx, > > Haochen > > > > > -----Original Message----- > > > From: Haochen Jiang <haochen.ji...@intel.com> > > > Sent: Wednesday, May 15, 2024 11:04 AM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Liu, Hongtao <hongtao....@intel.com>; ubiz...@gmail.com > > > Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue > > > > > > Hi all, > > > > > > Recently, we have encountered several random performance regressions in > > > benchmarks commit to commit. It is caused by cross cacheline issue for > > > tight > > > loops. > > > > > > We are trying to solve the issue by two patches. One is adjusting the loop > > > alignment for generic tune, the other is aligning tight and hot loops more > > > aggressively. > > > > > > For SPECINT, we get a 0.85% improvement overall in rates, under option > > > -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids. > > > > > > BenchMarks EMR Rates > > > 500.perlbench_r -1.21% > > > 502.gcc_r 0.78% > > > 505.mcf_r 0.00% > > > 520.omnetpp_r 0.41% > > > 523.xalancbmk_r 1.33% > > > 525.x264_r 2.83% > > > 531.deepsjeng_r 1.11% > > > 541.leela_r 0.00% > > > 548.exchange2_r 2.36% > > > 557.xz_r 0.98% > > > Geomean-int 0.85% > > > > > > Side effect is that we get a 1.40% increase in codesize. > > > > > > BenchMarks EMR Codesize > > > 500.perlbench_r 0.70% > > > 502.gcc_r 0.67% > > > 505.mcf_r 3.26% > > > 520.omnetpp_r 0.31% > > > 523.xalancbmk_r 1.15% > > > 525.x264_r 1.11% > > > 531.deepsjeng_r 1.40% > > > 541.leela_r 1.31% > > > 548.exchange2_r 3.06% > > > 557.xz_r 1.04% > > > Geomean-int 1.40% > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for this if there's no objection in 48 hours. > > > > > > After we committed into trunk for a month, if there isn't any unexpected > > > happen. We planned to backport it to GCC14.2. > > > > > > Thx, > > > Haochen > > > > > > Haochen Jiang (1): > > > Adjust generic loop alignment from 16:11:8 to 16 for Intel processors > For this one, current znver{1,2,3,4,5}_cost already set loop align as > 16, so I think it should be fine set it to generic_cost. > > > > > > liuhongt (1): > > > Align tight&hot loop without considering max skipping bytes. > For this one, although we have seen similar growth on AMD's > processors, it's still nice to have someone from AMD to look at this > to see if it's what they need. > > > > > > gcc/config/i386/i386.cc | 148 ++++++++++++++++++++++++++++++- > > > gcc/config/i386/i386.md | 10 ++- > > > gcc/config/i386/x86-tune-costs.h | 2 +- > > > 3 files changed, 154 insertions(+), 6 deletions(-) > > > > > > -- > > > 2.31.1 > > > > > -- > BR, > Hongtao
-- BR, Hongtao