[LLVMbugs] [Bug 21943] New: [AVX/AVX2] Inefficient vector shuffle lowering: 5 instructions instead of one movhps|d

bugzilla-daemon Thu, 18 Dec 2014 10:25:38 -0800

http://llvm.org/bugs/show_bug.cgi?id=21943


            Bug ID: 21943
           Summary: [AVX/AVX2] Inefficient vector shuffle lowering: 5
                    instructions instead of one movhps|d
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified

Created attachment 13558
  --> http://llvm.org/bugs/attachment.cgi?id=13558&action=edit
IR to reproduce the problem.

Tested with trunk r224470.

In the attached IR we fail to recognize a movhp pattern, i.e., res =
input1[0,1], input2[0,1].
Instead we generate a long sequence of vector shuffle to produce the desired
output.

Interestingly, this problem happens only when AVX and/or AVX2 are enabled. SSE
lowering works just fine.


** To Reproduce **

llc -mtriple=x86_64-apple-macosx avx_movhps.ll -o - -mattr=+avx[2]


** Result **

The output gives the lowering of 3 different functions that do the exact same
thing but with different IR. Both the second (baz) and third (bar) functions
are canonicalized on the IR of the second function if run through opt.
Right now, llc gives the correct result only for the third one (i.e., the
non-cannonicalized one).

_foo:                                   ## @foo
    .cfi_startproc
## BB#0:                                ## %for.body
    vmovq    (%rsi), %xmm0
    vmovq    (%rdi), %xmm1
    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
    vinsertps    $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
    vpermilps    $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
    vinsertps    $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]
    retq
    .cfi_endproc

    .globl    _baz
    .align    4, 0x90
_baz:                                   ## @baz
    .cfi_startproc
## BB#0:                                ## %for.body
    vmovq    (%rsi), %xmm0
    vmovq    (%rdi), %xmm1
    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
    vinsertps    $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
    vpermilps    $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
    vinsertps    $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]
    retq
    .cfi_endproc

    .globl    _bar
    .align    4, 0x90
_bar:                                   ## @bar
    .cfi_startproc
## BB#0:                                ## %for.body
    vmovq    (%rdi), %xmm0
    vmovhpd    (%rsi), %xmm0, %xmm0
    retq
    .cfi_endproc

For the record, here is the assembly without -mattr:
_foo:                                   ## @foo
    .cfi_startproc
## BB#0:                                ## %for.body
    movq    (%rdi), %xmm0
    movhpd    (%rsi), %xmm0
    retq
    .cfi_endproc

    .globl    _baz
    .align    4, 0x90
_baz:                                   ## @baz
    .cfi_startproc
## BB#0:                                ## %for.body
    movq    (%rdi), %xmm0
    movhpd    (%rsi), %xmm0
    retq
    .cfi_endproc

    .globl    _bar
    .align    4, 0x90
_bar:                                   ## @bar
    .cfi_startproc
## BB#0:                                ## %for.body
    movq    (%rdi), %xmm0
    movhpd    (%rsi), %xmm0
    retq
    .cfi_endproc

** Note **

Interestingly, the first two instructions of the current lowering produce the
identity:
    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
In other words, we shouldn't emit them even if we were not able to grab the
movhpd|s pattern.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs

[LLVMbugs] [Bug 21943] New: [AVX/AVX2] Inefficient vector shuffle lowering: 5 instructions instead of one movhps|d

Reply via email to