https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87599

            Bug ID: 87599
           Summary: Broadcasting scalar to vector uses stack unnecessarily
                    on x86
           Product: gcc
           Version: 8.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vgatherps at gmail dot com
  Target Milestone: ---

When compiled on GCC 8.2 with -O2, 

typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));

__m128i vectorize(long val) {
    __m128i rval = {val, val};
    return rval;
}

generates the following code:

    mov     QWORD PTR [rsp-16], rdi
    movq    xmm0, QWORD PTR [rsp-16]
    punpcklqdq      xmm0, xmm0
    ret

Which could be replaced with

    movq    xmm0, rdi
    punpcklqdq      xmm0, xmm0
    ret

Interestingly, according to godbolt, the current trunk makes this optimization
with -Os but not with -O2 or -O3.

Reply via email to