https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78954

            Bug ID: 78954
           Summary: optimization: broadcast of non-constant scalar into
                    SSE2 register
           Product: gcc
           Version: 6.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jens.maurer at gmx dot net
  Target Milestone: ---

The following code goes through the stack instead of directly moving from the
register for "x" into (the low part of) "v":

#pragma GCC target ("sse2")
typedef unsigned int V __attribute__((vector_size(16)));

V f(int x)
{
  V v = { x, x, x, x };
  return v;
}

$ gcc -v -O3 -S x.cc
Target: x86_64-pc-linux-gnu
gcc version 6.3.0 (GCC) 

snippet from assembly:

        movl    %edi, -12(%rsp)
        movd    -12(%rsp), %xmm1
        pshufd  $0, %xmm1, %xmm0
        ret

Why do we move through the stack, instead of using a simple register move?

        movd    %edi, %xmm1
        pshufd  $0, %xmm1, %xmm0

Reply via email to