https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646
Bug ID: 113646 Summary: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: hubicka at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux, aarch64-linux Target: x86_64-linux, aarch64-linux Using profile guided optimization is very detrimental when compiling SPEC 2017 FPrate benchmark 538.imagick_r at -Ofast -march=native (with or without LTO) on all machines where I have tried. On Zen4, using PGO results in a 68% slower than not doing that without LTO and 65% with LTO: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.507.0&plot.1=966.507.0&plot.2=959.507.0&plot.3=958.507.0& On Zen3, using PGO slows the binary down by 22% when not using LTO and by 30% with LTO: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0&plot.1=473.507.0&plot.2=475.507.0&plot.3=477.507.0& On Zen2, PGO regresses by 16% without LTO and by 28% with it: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0&plot.1=293.507.0&plot.2=287.507.0&plot.3=286.507.0& On our Altra CPU, the slowdowns are 26% and 45%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=584.507.0&plot.1=583.507.0&plot.2=587.507.0&plot.3=589.507.0& On an Intel CascadeLake machine, they are 24% and 41%. (Our LNT Intel machine is temporarily offline, unfortunately). It is of course possible that the training workload does not match the reference one very well. However, this was not a problem in the past (apparently the problem is that our non-PGO results improved but our PGO ones did not). Also, other compilers such as LLVM achieve better run-times with PGO than without. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)