https://bugs.llvm.org/show_bug.cgi?id=37087

            Bug ID: 37087
           Summary: vpmovmskb+cmp equivalent to vmovmskps+cmp should maybe
                    lower to vmovmskps
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedb...@nondot.org
          Reporter: gonzalob...@gmail.com
                CC: llvm-bugs@lists.llvm.org

This snippet of code (see it live: https://godbolt.org/g/NuiGgc):

Generates:

wrong_instr: # @wrong_instr
  vmovaps xmm0, xmmword ptr [rsi]
  vcmpeqps xmm0, xmm0, xmmword ptr [rdi]
  vpmovmskb eax, xmm0
  cmp eax, 65535
  sete al
  ret
correct_instr: # @correct_instr
  vmovaps xmm0, xmmword ptr [rsi]
  vcmpeqps xmm0, xmm0, xmmword ptr [rdi]
  vmovmskps eax, xmm0
  cmp eax, 15
  sete al
  ret

Note how "wrong_istr" uses, as specified, pmovmskb. AFAICT both snippets are
semantically equivalent.

On broadwell and haswell these intrinsics have identical performance (from
Agner's tables):

PMOVMSKB r,v   mops fused: 1 mops unfused: 1 ports: p0 latency: 3 throughput: 1
MOVMSKPS r32,x mops fused: 1 mops unfused: 1 ports: p0 latency: 3 throughput: 1

On skylake MOVMSKPS appears to be slightly better:

PMOVMSKB   r,v mops fused: 1 mops unfused: 1 ports: p0 latency: 2-3 throughput:
1
MOVMSKPS r32,x mops fused: 1 mops unfused: 1 ports: p0 latency: 2   throughput:
1

Depending on the CPU, switching from operating on floating-point vectors to
operating on integer vectors might introduce extra latency in which case
movmskps would be preferable in this situation.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to