On Tue, Jul 8, 2025 at 12:48 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > aba3b9d3a48a0703fd565f7c5f0caf604f59970b is the first bad commit > commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b > Author: H.J. Lu <hjl.to...@gmail.com> > Date: Fri May 9 07:17:07 2025 +0800 > > x86: Extend the remove_redundant_vector pass > > which removed non all 0s/1s redundant vector loads, caused SPEC CPU 2017 > 519.lbm_r and 470.lbm performance regressions on AMD znverN processors. > Add a tuning option to keep non all 0s/1s redundant vector loads on AMD > znverN processors.
Do we know what actually happens here or is this basically reverting the change based on a new tunable and the reported regression? If I read the pass correctly it might insert broadcasts on paths where not originally computed (it inserts after the scalar def, which might be far away). ix86_broadcast_inner suggests it replaces extracts from a broadcast with the original broadcast value/register which means it might increase lifetime of the broadcast register. Both shouldn't be causing specifically regressions on Zen2, but can be bad. I think we need to understand better what the pass does (it's written without much commentary, so I tried to quickly reverse engineer it), and improve it, avoiding cases where it obviously increases register lifetime. > gcc/ > > PR target/120941 > * config/i386/i386-features.cc (ix86_broadcast_inner): Keep > non all 0s/1s redundant vector loads if asked. > * config/i386/x86-tune.def (X86_TUNE_KEEP_REDUNDANT_VECTOR_LOAD): > New tuning. > > gcc/testsuite/ > > PR target/120941 > * gcc.target/i386/pr120941-1a.c: New test. > * gcc.target/i386/pr120941-1b.c: Likewise. > * gcc.target/i386/pr120941-1c.c: Likewise. > * gcc.target/i386/pr120941-1d.c: Likewise. > > OK for master? > > Thanks. > > -- > H.J.