https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85390
Bug ID: 85390
Summary: possible missed optimisation / regression from 6.3
with conditional expression
Product: gcc
Version: 8.0.1
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vegard.nossum at oracle dot com
Target Milestone: ---
Input:
extern int a, b, c;
int f(int x)
{
__builtin_prefetch((void *) (x ? a : b));
return c;
}
Current trunk with -O3 produces this:
f(int):
testl %edi, %edi
je .L2
movslq a(%rip), %rax
prefetcht0 (%rax)
movl c(%rip), %eax
ret
.L2:
movslq b(%rip), %rax
prefetcht0 (%rax)
movl c(%rip), %eax
ret
While 6.3.0 did not have a branch:
f(int):
movslq a(%rip), %rdx
movslq b(%rip), %rax
testl %edi, %edi
cmovne %rdx, %rax
prefetcht0 (%rax)
movl c(%rip), %eax
ret
For reference, clang also outputs a branchless (but slightly longer) version:
f(int): # @f(int)
testl %edi, %edi
movl $a, %eax
movl $b, %ecx
cmovneq %rax, %rcx
movslq (%rcx), %rax
prefetcht0 (%rax)
movl c(%rip), %eax
retq
In my tests, the 6.3.0 code is equally fast in the x == 0 and x != 0 cases,
whereas trunk/8.0.1 is only half as fast as 6.3.0 in the x == 0 (branch taken)
case. In the branch not taken case, the 8.0.1 code has the same speed as the
6.3.0 code.