Joseph Wakeling:

> I was very happy to see that D _does_ have a 'reserve' function for arrays,
> which I had been missing compared to C++ (it's not mentioned in the array
> docs).

'reserve' is not a dynamic array method, it's a free function contained in the 
Phobos default module that also contains object, see the docs page:
http://www.digitalmars.com/d/2.0/phobos/object.html
It's not mentioned in the array docs because the 'reserve' and related 
functions are new, they come from a recent change in how dynamic arrays are 
implemented in D.


> Still, I don't think that pre-reserving the memory per se is the influencing
> factor on the differences in performance.

You are right. The difference in performance comes from mostly from the way D 
arrays are designed. I agree with you that dynamic array append in D is slow, 
in absolute terms. Comparing the performance of D code with C++ is a good thing 
to do, it gives a baseline for comparison.

D dynamic arrays are more flexible than C++ vector, they can be sliced, such 
slicing is O(1), and the slices are seen by the language just like other 
arrays. So you pay the price of some performance for such increased 
flexibility. The idea here is that the built-in data types must be as flexible 
as possible even if their performance is not so high, so they can be used for 
many different purposes. Then D standard library will have specialized data 
structures that are faster thanks to being more specialized and less flexible. 
In D dynamic arrays some of the performance price is also paid for the 
automatic memory management, for the GC that's not a precise GC (for example if 
your array has some empty items at the end past its true length, the GC must 
ignore them).

I think D dynamic arrays have a lower append performance because they don't 
contain the capacity in a handy place (their new design has a capacity, but 
it's not beside length and pointer). I don't know how the D array append works 
in situations of multithreading. If you don't require the slicing flexibility, 
you can use D language to implement a C++-style Vector in D. If you implement a 
struct that contains capacity, length and pointer to data, and you don't let 
the GC scan the memory of the data, and you add method overloading for the ~= 
that doubles the size of the memory with a realloc once in a while, you 
probably have a performance similar to the C++ one (not equal because the 
performance profile of the GC in allocating memory chunks is a little more 
complex than the C malloc). See the bottom of this post.


> D also uses about 20% more memory than the C++

I don't know why, I presume it's caused by the GC bookkeeping, it must be 
careful to avoid memory fragmentation. Currently the D GC is not so refined.


> Using an Appender class and put() (from std.array) is even slower,
> despite the std.array docs recommending this over ~. :-(

That was useful and designed for the precedent design of the dynamic arrays. 
Maybe it can be removed from Phobos now...


> It's disappointing because I'm loving so much of D's syntax, but I can
> write far more efficient code (with less explicit memory management)
> in C++ ...

As you know all data structures are the result of compromises, they are a 
balance of performance for their different uses, there is no perfect data 
structure; in practice D arrays are not designed for an efficient "en mass" 
appending. A D dynamic array is seen locally as a struct of 2 words. So it's 
cheap to copy and pass around. Why they don't contain the capacity too in such 
struct is a long story.

If you like to write benchmarks you will also find performance difference 
between for example a C++ hash_map and the built-in associative arrays of D.

Beside the design (that for example can make it hard to implement 
append-efficient dynamic arrays in D), D language is also rather new still, so 
far the focus was on its design, not in tuning the performance of its 
implementation. It's being several years they are performance-tuning C++ 
implementations :-)

-------------------

Below three implementations in D and C++. The Vector(T) is bare-bones. It 
missed most features and it's unsafe.

In the D2 version I don't use the GC, and the memory is deallocated when arr 
gets out of scope.

Timings, T=double, N=5_000_000, NLOOPS=100, seconds:
  D1:  38.20
  D2:   8.32
  C++:  6.65

You can see the performance of the append in D2 is a matter of data structure 
implementation, it's not a language problem :-)

With LDC (once we'll have a D2 version of it) the performance of D2 can 
probably be the same as the C++. DMD maybe loses a little here because it's not 
so good at inlining, or maybe because the C++ vector is better than this D2 
code.

-------------------

// First D version
import std.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    T[] arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        arr.assumeSafeAppend;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N)
            arr ~= j;

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

------------------

// Second D version
import std.stdio: printf;
import std.c.stdlib: malloc, realloc, free;

struct Vector(T) {
    enum double FAST_GROW = 2;
    enum double SLOW_GROW = 1.3;
    enum int STARTING_SIZE = 4;
    enum int BIG_MEM = (1 << 27) / T.sizeof;

    T* ptr;
    int length, capacity;

    ~this() {
        free(ptr);
        ptr = null;
        length = 0;
        capacity = 0;
    }

    void opOpAssign(string Op:"~=")(T item) {
        if (length >= capacity) {
            if (capacity == 0)
                _grow_capacity(STARTING_SIZE);
            else if (capacity < BIG_MEM)
                _grow_capacity(cast(int)(capacity * FAST_GROW));
            else
                _grow_capacity(cast(int)(capacity * SLOW_GROW));
        }

        ptr[length] = item;
        length++;
    }

    void _grow_capacity(int new_capacity) {
        ptr = cast(T*)realloc(ptr, new_capacity * T.sizeof);
        assert(ptr);
        this.capacity = new_capacity;
    }
}

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    Vector!T arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N)
            arr ~= j;

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

------------------

// C++ version

#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const unsigned int N = 5000000;
    const unsigned int NLOOPS = 100;
    std::vector<T> arr;

    for (unsigned int i = 0; i < NLOOPS; i++) {
        arr.resize(0);
        printf("Array capacity = %d\n", arr.capacity());

        for (unsigned int j = 0; j < N; j++)
            arr.push_back(j);

        printf("At iteration %u, x has %u elements.\n", i, arr.size());
    }
}

Bye,
bearophile

Reply via email to