> On May 25, 2023, at 03:30, Cui, Lili via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > From: Lili Cui <lili....@intel.com> > > Make some changes in reassoc pass to make it more friendly to fma pass later. > Using FMA instead of mult + add reduces register pressure and insruction > retired. > > There are mainly two changes > 1. Put no-mult ops and mult ops alternately at the end of the queue, which is > conducive to generating more fma and reducing the loss of FMA when breaking > the chain. > 2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel > chains according to the given correlation width, keeping the FMA chance as > much as possible. > > With the patch applied > > On ICX: > 507.cactuBSSN_r: Improved by 1.7% for multi-copy . > 503.bwaves_r : Improved by 0.60% for single copy . > 507.cactuBSSN_r: Improved by 1.10% for single copy . > 519.lbm_r : Improved by 2.21% for single copy . > no measurable changes for other benchmarks. > > On aarch64 > 507.cactuBSSN_r: Improved by 1.7% for multi-copy. > 503.bwaves_r : Improved by 6.00% for single-copy. > no measurable changes for other benchmarks.
Hi Cui, I'm seeing a 4% slowdown on 436.cactusADM from SPEC CPU2006 on aarch64-linux-gnu (Cortex-A57) when compiling with "-O2 -flto". All other benchmarks seem neutral to this patch, and I didn't observe the slow down with plain -O2 no-LTO or with -O3. Is this something interesting to investigate? I'll be happy to assist. Kind regards, -- Maxim Kuvyrkov https://www.linaro.org