https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85697

            Bug ID: 85697
           Summary: At -Os nontrivial ctor does not use SSE to zero
           Product: gcc
           Version: 8.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

struct alignas(16) A {
    A (void) :a(0),b(0),c(0),d(0) {}
    int a,b,c,d;
};
__attribute__((noinline)) void UseA (A& a) { a.a=1; }

int main (void)
{
    A a {};
    UseA (a);
    return a.a;
}

-Os -march=native on Haswell, generates:

main:
        subq    $16, %rsp
        movq    %rsp, %rdi
        movq    $0, (%rsp)
        movq    $0, 8(%rsp)
        call    _Z4UseAR1A
        movl    (%rsp), %eax
        addq    $16, %rsp
        ret

Using 16 bytes to zero A with 2 movq. With -O3:

main:
        subq    $24, %rsp
        vpxor   %xmm0, %xmm0, %xmm0
        movq    %rsp, %rdi
        vmovaps %xmm0, (%rsp)
        call    _Z4UseAR1A
        movl    (%rsp), %eax
        addq    $24, %rsp
        ret

using only 9 bytes for pxor/movaps. With -mno-avx it is 7 bytes for
xorps/movaps. With multiple objects of type A, the savings would be even
greater, since only one pxor would be needed for all and only 4 bytes per
object for zeroing.

Removing A constructor also results in SSE instruction use.

Reply via email to