https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78954
Bug ID: 78954 Summary: optimization: broadcast of non-constant scalar into SSE2 register Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.maurer at gmx dot net Target Milestone: --- The following code goes through the stack instead of directly moving from the register for "x" into (the low part of) "v": #pragma GCC target ("sse2") typedef unsigned int V __attribute__((vector_size(16))); V f(int x) { V v = { x, x, x, x }; return v; } $ gcc -v -O3 -S x.cc Target: x86_64-pc-linux-gnu gcc version 6.3.0 (GCC) snippet from assembly: movl %edi, -12(%rsp) movd -12(%rsp), %xmm1 pshufd $0, %xmm1, %xmm0 ret Why do we move through the stack, instead of using a simple register move? movd %edi, %xmm1 pshufd $0, %xmm1, %xmm0