https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123020
Bug ID: 123020
Summary: [missed-optimization] Equivalent ternary and if
produce different loop (extra compare in g())
Product: gcc
Version: 15.2.1
URL: https://godbolt.org/z/dT85acP66
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: skywave2023 at outlook dot com
Target Milestone: ---
Target: aarch64-apple-darwin25
GCC version:
Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/gcc-15.2.0/bin/g++
COLLECT_LTO_WRAPPER=/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../gcc-15.2.0/configure
--prefix=/opt/compiler-explorer/gcc-build/staging
--enable-libstdcxx-backtrace=yes --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu
--enable-languages=c,c++,fortran,ada,objc,obj-c++,go,d,m2,rust,cobol
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build-gcc--binutils-2.44
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.2.0 (Compiler-Explorer-Build-gcc--binutils-2.44)
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
'-fno-verbose-asm' '-v' '-O2' '-L./lib' '-shared-libgcc' '-mtune=generic'
'-march=x86-64' '-dumpdir' '/app/output.s-'
/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/cc1plus
-quiet -v -imultiarch x86_64-linux-gnu -iprefix
/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/bin/../lib/gcc/x86_64-linux-gnu/15.2.0/
-D_GNU_SOURCE <source> -quiet -dumpdir /app/output.s- -dumpbase example.cpp
-dumpbase-ext .cpp -mtune=generic -march=x86-64 -g -O2 -version
-fdiagnostics-color=always -fno-verbose-asm -o /tmp/ccIwbjB7.s
GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.44) version 15.2.0
(x86_64-linux-gnu)
compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
E
Environment:
Reproduced on Compiler Explorer (godbolt.org) with the x86_64 gcc compiler
"x86-64 gcc 15.2" shown in the left-hand compiler selection.
This is a code-quality / missed-optimization issue in the optimizer, not a
miscompilation. The two functions below are semantically equivalent but GCC
generates a noticeably worse loop for the ternary form.
=== Minimal testcase (test.cpp) ===
using ui = unsigned;
using ull = unsigned long long;
constexpr int n = 100;
ui a[n + 1];
struct Result {
ui mx;
ull sum;
};
Result f() {
ui mx = 0;
ull sum = 0;
for (int i = 1; i <= n; ++i) {
if (mx < a[i]) {
mx = a[i];
sum += a[i];
}
}
return {mx, sum};
}
Result g() {
ui mx = 0;
ull sum = 0;
for (int i = 1; i <= n; ++i) {
bool flag = mx < a[i];
mx = flag ? a[i] : mx;
sum = flag ? sum + a[i] : sum;
}
return {mx, sum};
}
int main() {
a[1] = 1;
Result r1 = f();
Result r2 = g();
if (r1.mx != r2.mx || r1.sum != r2.sum)
__builtin_abort();
}
=== Semantics ===
Functions f() and g() are semantically equivalent in C++: both maintain the
maximum element seen so far in mx, and the sum of all elements that became a
new maximum in sum.
In g(), the boolean "flag" is computed as (mx < a[i]) using the old value of
mx and then used twice, once to update mx and once to update sum. This is
logically the same as the single "if" in f().
=== Actual code generated ===
With -O2 on x86_64, GCC produces a tight loop for f() that uses a single
compare and reuses the flags for both conditional moves. For example
(abridged Intel syntax, loop body only):
f():
mov ecx, DWORD PTR [rax] # load a[i]
mov rdx, rcx # copy value
add rcx, rsi # candidate = sum + a[i]
cmp edi, edx # compare mx and value
cmovb rsi, rcx # if (mx < value) sum = candidate
cmovb rdi, rdx # if (mx < value) mx = value
...
For g(), GCC generates code with two compares and extra moves in the inner
loop. For example, the loop body looks like:
g():
mov esi, DWORD PTR [rcx] # load a[i]
mov r8d, edx # save old mx
cmp edx, esi # first compare (mx vs a[i])
mov eax, esi # copy a[i]
cmovb rdx, rsi # update mx
add rsi, rdi # candidate = sum + a[i] (clobbers flags)
cmp r8d, eax # second compare (old mx vs a[i])
cmovb rdi, rsi # update sum
...
So g() ends up with two identical compares per iteration and some extra
register moves to preserve the operands for the second compare, even though
f() and g() have the same semantics.
This difference shows up in microbenchmarks when the array fits in cache,
but the issue is purely about the code shape in the generated loop.
=== Expected behaviour ===
It seems that the optimizer should be able to recognize the pattern in g()
and generate essentially the same loop as for f(): a single compare whose
condition codes are reused for two conditional updates (for mx and sum),
without repeating the comparison.
In other words, g() looks like a straightforward missed-optimization
opportunity in the presence of the ternary operator form.
=== Additional notes ===
- This is not a correctness issue; both f() and g() produce the same
result, as checked in main().
- The preprocessed source for this testcase, obtained using -save-temps,
is attached as test.ii.
- Suggested component: tree-optimization.
- Suggested keyword: missed-optimization.