https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82732
Bug ID: 82732 Summary: malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- #include <string.h> #include <stdlib.h> int *foo(unsigned size) { int *p = malloc(size*sizeof(int)); //memset(p,0, size*sizeof(int)); for (unsigned i=0; i<size; i++) { p[i]=0; } return p; } gcc -O3 -march=haswell https://godbolt.org/g/bpGHoa pushq %rbx movl %edi, %edi # zero-extend movq %rdi, %rbx # why 64-bit operand-size here? salq $2, %rdi call malloc movq %rax, %rcx testl %ebx, %ebx # check that size was non-zero before looping je .L6 leal -1(%rbx), %eax movq %rcx, %rdi xorl %esi, %esi leaq 4(,%rax,4), %rdx # redo the left-shift call memset movq %rax, %rcx .L6: movq %rcx, %rax # this is dumb, either way we get here malloc return value is already in %rax. memset returns it. popq %rbx ret So gcc figures out that this is malloc+memset, but I guess not until after the pass that recognizes that as calloc. But with explicit memset and gcc -O3, we get the zeroing loop to optimize away as well foo: movl %edi, %edi movl $1, %esi salq $2, %rdi jmp calloc Unfortunately at -O2 we still get a loop that stores 4 bytes at a time, *after calloc*. I know -O2 doesn't enable all the optimizations, but I thought it would do better than this for "manual" zeroing loops.