https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124127
Bug ID: 124127
Summary: [16 Regression] 9% slowdown of 503.bwaves_r on Zen3
since r16-1644-gaba3b9d3a48a07
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pheeck at gcc dot gnu.org
CC: hjl at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
** This bug was split out from pr120957 **
As seen here
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.427.0
there was a 9% exec time slowdown of the 503.bwaves SPEC 2017
benchmark when run with -Ofast -march=native on an AMD Zen3 machine.
I bisected the slowdown to r16-1644-gaba3b9d3a48a07
commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b
Author: H.J. Lu <[email protected]>
AuthorDate: Fri May 9 07:17:07 2025 +0800
Date: Fri May 9 07:17:07 2025 +0800
Commit: H.J. Lu <[email protected]>
CommitDate: Tue Jun 24 14:02:56 2025 +0800
x86: Extend the remove_redundant_vector pass
Extend the remove_redundant_vector pass to handle vector broadcasts from
constant and variable scalars. When broadcasting from constants and
function arguments, we can place a single widest vector broadcast at
entry of the nearest common dominator for basic blocks with all uses
since constants and function arguments aren't changed. For broadcast
from variables with a single definition, the single definition is
replaced with the widest broadcast.
When this slowdown was introduced, bwaves was only slightly slower with GCC16
than with GCC15, but GCC15 improved since then so I guess this can be
considered a regression. See the comparison here:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1064.427.0&plot.1=1181.427.0&plot.2=471.427.0&
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)