https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526

            Bug ID: 91526
           Summary: Unnecessary SSE and other instructions generated when
                    compiling in C mode (vs. C++ mode)
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: warp at iki dot fi
  Target Milestone: ---

Consider the following piece of code:

//--------------------------------------------------------------
struct Vec { float v[8]; };

struct Vec multiply(const struct Vec* v1, const struct Vec* v2)
{
    struct Vec result;
    for(unsigned i = 0; i < 8; ++i)
        result.v[i] = v1->v[i] * v2->v[i];
    return result;
}
//--------------------------------------------------------------

If this is compiled as C++, using g++ 9.2 with options -Ofast -march=skylake,
the following result is produced:

_Z8multiplyPK3VecS1_:
  vmovups ymm0, YMMWORD PTR [rdx]
  mov rax, rdi
  vmulps ymm0, ymm0, YMMWORD PTR [rsi]
  vmovups YMMWORD PTR [rdi], ymm0
  vzeroupper
  ret

However, if it's compiled as C, using the same options, this is produced:

multiply:
  push rbp
  mov rax, rdi
  mov rbp, rsp
  and rsp, -32
  vmovups ymm0, YMMWORD PTR [rdx]
  vmulps ymm0, ymm0, YMMWORD PTR [rsi]
  vmovaps YMMWORD PTR [rsp-32], ymm0
  vmovdqa xmm2, XMMWORD PTR [rsp-16]
  vmovups XMMWORD PTR [rdi], xmm0
  vmovups XMMWORD PTR [rdi+16], xmm2
  vzeroupper
  leave
  ret

Not only are extra instructions surrounding the code, but moreover the
assignment of the result into [rdi] has for some reason been split into two
parts.

Both clang and icc produce the same result (very similar to the first result
above) regardless of whether compiling as C or C++.

Reply via email to