https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82666

Chih-Hsuan Yang <oscar.yang at cycraft dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |oscar.yang at cycraft dot com

--- Comment #20 from Chih-Hsuan Yang <oscar.yang at cycraft dot com> ---
Hello

A related x86_64 data point in a different loop shape, in case it helps triage
the general cost-model issue: libstdc++'s std::priority_queue push/pop is ~2x
slower under gcc than clang at -O2 because the "pick the larger child"
comparison in `__adjust_heap` is if-converted into a 3-deep cmov chain on a
pointer-chasing loop-carried critical path (the selected child index/pointer
feeds the next load). Unlike the reduction here it is not a "cond ? x : 0"
select, so `noce_try_cond_zero_arith` (comment #17) doesn't cover it;
-fno-if-conversion -fno-if-conversion2 roughly halves cycles with branch-miss <
0.1%.

I filed it separately as PR 125617 (minimal reproducer + perf numbers) to keep
the testcases distinct, but cross-linking here since it looks like the same
underlying if-conversion cost-model gap on x86.

Reply via email to