https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842
--- Comment #12 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #11) > (In reply to Tamar Christina from comment #9) > > (In reply to Hongtao Liu from comment #8) > > > (In reply to Tamar Christina from comment #7) > > > > (In reply to Hongtao Liu from comment #6) > > > > > I noticed some double-counting of cost in group-candidate (regarding > > > > > loop > > > > > invariant expressions), this modification reduces the number of > > > > > instructions > > > > > executed by ~8% for exchange_r binary compiled with -march=x86-64-v3 > > > > > -O2. > > > > > > > > > > > > > Note that this patch causes regressions on AArch64. While exchange > > > > improves > > > > slightly I see regressions in: leela, -5%, mcf, xz, x264, deepsjeng -2%, > > > > geomean -1% > > > > > > What options do you use, we have an AmpereOne machine, like to try to see > > > if > > > it's reproduciable on it. > > > > This was on Neoverse-V2, but probably reproducible on AmpereOne, the flags > > was -mcpu=native -Ofast -fomit-framepointer -flto=auto > > I tested my patch against latest trunk, and use the same option, can't > reproduce those regression on AWS graviton4. > Sorry for the slow response. I did rebase and retry with latest trunk and indeed I no longer see any slowdowns with current trunk.