[Bug tree-optimization/82732] New: malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset

peter at cordes dot ca Thu, 26 Oct 2017 03:53:15 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82732


            Bug ID: 82732
           Summary: malloc+zeroing other than memset not optimized to
                    calloc, so asm output is malloc+memset
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

#include <string.h>
#include <stdlib.h>

int *foo(unsigned size)
{
        int *p = malloc(size*sizeof(int));
        //memset(p,0, size*sizeof(int));

        for (unsigned i=0; i<size; i++) {
                p[i]=0;
        }
        return p;
}

gcc -O3 -march=haswell    https://godbolt.org/g/bpGHoa

        pushq   %rbx
        movl    %edi, %edi       # zero-extend
        movq    %rdi, %rbx       # why 64-bit operand-size here?
        salq    $2, %rdi
        call    malloc

        movq    %rax, %rcx
        testl   %ebx, %ebx       # check that size was non-zero before looping
        je      .L6
        leal    -1(%rbx), %eax
        movq    %rcx, %rdi
        xorl    %esi, %esi
        leaq    4(,%rax,4), %rdx  # redo the left-shift
        call    memset
        movq    %rax, %rcx
.L6:
        movq    %rcx, %rax       # this is dumb, either way we get here malloc
return value is already in %rax.  memset returns it.
        popq    %rbx
        ret

So gcc figures out that this is malloc+memset, but I guess not until after the
pass that recognizes that as calloc.


But with explicit memset and gcc -O3, we get the zeroing loop to optimize away
as well

foo:
        movl    %edi, %edi
        movl    $1, %esi
        salq    $2, %rdi
        jmp     calloc

Unfortunately at -O2 we still get a loop that stores 4 bytes at a time, *after
calloc*.  I know -O2 doesn't enable all the optimizations, but I thought it
would do better than this for "manual" zeroing loops.

[Bug tree-optimization/82732] New: malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset

Reply via email to