Bug ID: 37502
           Summary: _mm_set_ps is lowered badly with sse4
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86

__m128 f(float aYScale, float aXScale) {
   return _mm_set_ps(aYScale, aXScale, aYScale, aXScale);

With -mssse3 this compiles to:

        unpcklps        %xmm0, %xmm1    # xmm1 =
        movddup %xmm1, %xmm0            # xmm0 = xmm1[0,0]

with -mssse4 this compiles to:
        movaps  %xmm1, %xmm2
        insertps        $16, %xmm0, %xmm2 # xmm2 = xmm2[0],xmm0[0],xmm2[2,3]
        insertps        $32, %xmm1, %xmm2 # xmm2 = xmm2[0,1],xmm1[0],xmm2[3]
        insertps        $48, %xmm0, %xmm2 # xmm2 = xmm2[0,1,2],xmm0[0]
        movaps  %xmm2, %xmm0

llvm-mca -mcpu=haswell agrees that the ssse3 version is better:

Iterations:     1
Instructions:   2
Total Cycles:   5
Dispatch Width: 4
IPC:            0.40


Iterations:     1
Instructions:   5
Total Cycles:   8
Dispatch Width: 4
IPC:            0.62

You are receiving this mail because:
You are on the CC list for the bug.
llvm-bugs mailing list

Reply via email to