https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71725
Bug ID: 71725
Summary: Backend decides to generate larger and possibly slower
float ops for integer ops that appear in source
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
The following testcase derived from gcc.target/i386/xorps-sse2.c (see PR54716)
generates FP ops for the xor which uses a larger opcode and possibly is slower
when g is a trap/denormal representation(?)
#define vector __attribute__ ((vector_size (16)))
vector int x(vector float f, vector int h)
{
vector int g = { 0x80000000, 0, 0x80000000, 0 };
vector int f_int = (vector int) f;
return (f_int ^ g) + h;
}
x:
.LFB1:
.cfi_startproc
xorps .LC0(%rip), %xmm0
paddd %xmm1, %xmm0
ret
flags used are -O -msse2 -mno-sse3.
Today r191827 might be better implemented in sth like the STV pass which can
apply logic that isn't localized to a single instruction but really to
the context as the gcc.target/i386/xorps-sse2.c testcase claims to test.