https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526
Bug ID: 91526 Summary: Unnecessary SSE and other instructions generated when compiling in C mode (vs. C++ mode) Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: warp at iki dot fi Target Milestone: --- Consider the following piece of code: //-------------------------------------------------------------- struct Vec { float v[8]; }; struct Vec multiply(const struct Vec* v1, const struct Vec* v2) { struct Vec result; for(unsigned i = 0; i < 8; ++i) result.v[i] = v1->v[i] * v2->v[i]; return result; } //-------------------------------------------------------------- If this is compiled as C++, using g++ 9.2 with options -Ofast -march=skylake, the following result is produced: _Z8multiplyPK3VecS1_: vmovups ymm0, YMMWORD PTR [rdx] mov rax, rdi vmulps ymm0, ymm0, YMMWORD PTR [rsi] vmovups YMMWORD PTR [rdi], ymm0 vzeroupper ret However, if it's compiled as C, using the same options, this is produced: multiply: push rbp mov rax, rdi mov rbp, rsp and rsp, -32 vmovups ymm0, YMMWORD PTR [rdx] vmulps ymm0, ymm0, YMMWORD PTR [rsi] vmovaps YMMWORD PTR [rsp-32], ymm0 vmovdqa xmm2, XMMWORD PTR [rsp-16] vmovups XMMWORD PTR [rdi], xmm0 vmovups XMMWORD PTR [rdi+16], xmm2 vzeroupper leave ret Not only are extra instructions surrounding the code, but moreover the assignment of the result into [rdi] has for some reason been split into two parts. Both clang and icc produce the same result (very similar to the first result above) regardless of whether compiling as C or C++.