bearophile wrote:
Frits van Bommel:

Thank you for your answers.

This one is only done for certain GC allocations by the way, not all of them. The ones currently implemented are:
  * new Struct/int/float/etc.,
  * uninitialized arrays (used for arr1 ~ arr2, for instance),
  * zero-initialized arrays (e.g. new int[N])
  * new Class, unless
    a) it has a destructor,
    b) it has a custom allocator (overloads new), or
    c) it has a custom deallocator (overloads delete).

I'm trying to find situations where that's true, but in two small programs that 
use both structs and classes (that don't escape the scope and follow your 
unless list) I see:

call    _d_allocmemoryT
call _d_allocclass
Are those calls to variants of alloca()?

No, those are GC allocations.

This small program contains no gc allocations with ldc -O3:
-----
struct Struct {
    int i, j = 4;
}

class Class {
    int i, j = 6;
}

int frob(T)(T t) {
    t.i = 4;
    return t.j;
}

int withStruct() {
    return frob(new Struct);
}

int withClass() {
    return frob(new Class);
}
-----

It does still contain them when inlining is disabled, as it is by default with -O2 (aka -O); this seems to be because the LLVM pass that adds parameter attributes (like nocapture, better known as 'scope' in these newsgroups) is missing from the default list of optimizations :(. I'll fix this in the repository soon.

Another constraint I forgot to mention: it doesn't work for allocations in loops, because it's tricky to figure out whether the allocation is still reachable when the loop reaches the same position again. (For this reason, the pass by default runs before each inliner run and once after all inlining is done since the inliner can inline code into loops, yet allows for simplifications that make escape analysis more accurate)

While looking for those alloca I have also tested code that has the following 
two lines one after the other:
    auto a = new int[1000];
    a[] = 2;

That code is very common, because you currently can't write:
    auto a = new int[1000] = 2;

The latest LDC compiles that as:

        pushl   %esi
        subl    $4016, %esp
        leal    16(%esp), %esi
        movl    %esi, (%esp)
        movl    $4000, 8(%esp)
        movl    $0, 4(%esp)
        call    memset
        movl    %esi, (%esp)
        movl    $2, 8(%esp)
        movl    $1000, 4(%esp)
        call    _d_array_init_i32

I think the memset may be avoided.

That's trickier to get right, because the optimizer would have to look ahead to see the new memset call is always followed by the initialization, with no reads in between. The 1-byte element case can probably be handled by LLVM if _d_array_init_i8 is replaced by another memset, though. (and similarly, _d_array_init_i16 could be handled for cases like 0xFFFF, but not 0x1234, by turning it into memset).

Reply via email to