https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82666
Chih-Hsuan Yang <oscar.yang at cycraft dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |oscar.yang at cycraft dot com
--- Comment #20 from Chih-Hsuan Yang <oscar.yang at cycraft dot com> ---
Hello
A related x86_64 data point in a different loop shape, in case it helps triage
the general cost-model issue: libstdc++'s std::priority_queue push/pop is ~2x
slower under gcc than clang at -O2 because the "pick the larger child"
comparison in `__adjust_heap` is if-converted into a 3-deep cmov chain on a
pointer-chasing loop-carried critical path (the selected child index/pointer
feeds the next load). Unlike the reduction here it is not a "cond ? x : 0"
select, so `noce_try_cond_zero_arith` (comment #17) doesn't cover it;
-fno-if-conversion -fno-if-conversion2 roughly halves cycles with branch-miss <
0.1%.
I filed it separately as PR 125617 (minimal reproducer + perf numbers) to keep
the testcases distinct, but cross-linking here since it looks like the same
underlying if-conversion cost-model gap on x86.