https://bugs.llvm.org/show_bug.cgi?id=37344

            Bug ID: 37344
           Summary: vector approximate reciprocal square root generates
                    bad code on x86
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]

The following LLVM IR (see it live: https://godbolt.org/g/88kuky) computes the
approximate vector reciprocal square root rsqrt(x) ~= 1/ sqrt(x):

declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <4 x float> @rsqrt(<4 x float>)  {
  %a = call afn <4 x float> @llvm.sqrt.v4f32(<4 x float> %0)
  %c = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float
1.000000e+00, float 1.000000e+00>, %a
  ret <4 x float> %c
}

On x86_64 with -O3 and sse4.2 they generate the following assembly:

LCPI0_0:
  .long 1065353216 # float 1
  .long 1065353216 # float 1
  .long 1065353216 # float 1
  .long 1065353216 # float 1
rsqrt: # @rsqrt
  sqrtps xmm1, xmm0
  movaps xmm0, xmmword ptr [rip + .LCPI0_0]
  divps xmm0, xmm1
  ret

However, it should just generate a call to rsqrtps .

I've tried with fast math flags but haven't been able to generate rsqrtps yet.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to