https://bugs.llvm.org/show_bug.cgi?id=37344
Bug ID: 37344
Summary: vector approximate reciprocal square root generates
bad code on x86
Product: new-bugs
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
The following LLVM IR (see it live: https://godbolt.org/g/88kuky) computes the
approximate vector reciprocal square root rsqrt(x) ~= 1/ sqrt(x):
declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <4 x float> @rsqrt(<4 x float>) {
%a = call afn <4 x float> @llvm.sqrt.v4f32(<4 x float> %0)
%c = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float
1.000000e+00, float 1.000000e+00>, %a
ret <4 x float> %c
}
On x86_64 with -O3 and sse4.2 they generate the following assembly:
LCPI0_0:
.long 1065353216 # float 1
.long 1065353216 # float 1
.long 1065353216 # float 1
.long 1065353216 # float 1
rsqrt: # @rsqrt
sqrtps xmm1, xmm0
movaps xmm0, xmmword ptr [rip + .LCPI0_0]
divps xmm0, xmm1
ret
However, it should just generate a call to rsqrtps .
I've tried with fast math flags but haven't been able to generate rsqrtps yet.
--
You are receiving this mail because:
You are on the CC list for the bug._______________________________________________
llvm-bugs mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs