https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599
Bug ID: 87599 Summary: Broadcasting scalar to vector uses stack unnecessarily on x86 Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vgatherps at gmail dot com Target Milestone: --- When compiled on GCC 8.2 with -O2, typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); __m128i vectorize(long val) { __m128i rval = {val, val}; return rval; } generates the following code: mov QWORD PTR [rsp-16], rdi movq xmm0, QWORD PTR [rsp-16] punpcklqdq xmm0, xmm0 ret Which could be replaced with movq xmm0, rdi punpcklqdq xmm0, xmm0 ret Interestingly, according to godbolt, the current trunk makes this optimization with -Os but not with -O2 or -O3.