Hi!

I'm learning to use D collections properly, and I'm looking for a sorted data structure with logarithmic access time (i.e., a binary search tree will do, but a hash table would not help). As far as I can see, std.container.RedBlackTree is exactly what I need. However, I am not sure if I use it as intended as its performance seems inferior to a C++ STL solution (e.g., std::set).

To be more specific, right now I wonder what is the best (or intended) way to store an object in the RedBlackTree: should it be a class reference, or a struct (passed by value), or something quirkier like an integer pointing into an array or a simple pointer. The rest of my program suggested to use structs, but the whole thing turned out to be rather slow, and the profiler told me that these structs are being copied around much more than I anticipated.

And so I wrote a minimalistic test program to check the number of copy (postblit) constructor calls. Here is the D version:

-----
import std.container;
import std.stdio;

immutable int LIMIT = 100000;

struct element
{
        static int postblit_counter;

        long x;

        int opCmp (ref element other)
        {
                return (x > other.x) - (x < other.x);
        }

        this (long nx)
        {
                x = nx;
        }

        this (ref this)
        {
                assert (x == x);
                postblit_counter++;
        }
}

alias RedBlackTree !(element) container;

void main ()
{
        auto f = new container ();
        element.postblit_counter = 0;
        foreach (i; 0..LIMIT)
        {
                f.insert (element (i));
        }
        writefln ("%s", element.postblit_counter);
}
-----

And now here is a C++ equivalent:

-----
#include <cstdio>
#include <set>
#include <stdint.h>

const int LIMIT = 100000;

using namespace std;

struct element
{
        static int postblit_counter;

        int64_t x;

        bool operator < (const element other) const
        {
                return (x < other.x);
        }

        element (int64_t nx)
        {
                x = nx;
        }

        element (const element & other)
        {
                postblit_counter++;
        }
};

int element::postblit_counter;

typedef set <element> container;

int main (void)
{
        container f;
        element::postblit_counter = 0;
        for (int i = 0; i < LIMIT; i++)
        {
                f.insert (element (i));
        }
        printf ("%d\n", element::postblit_counter);
        return 0;
}
-----

And the results are:
D2 (DMD 2.059, -O):             11,389,556
C++ (MinGW GCC 4.7.2, -O2):      3,072,387

As you can see, in order to insert 100,000 elements, D needs a few times more copy constructor calls than C++. However, as far as I know, the internal structure used by C++ std::set is the very same red-black tree! Am I doing something wrong? And if not, what is this cost paid for, are there any features that RedBlackTree possesses and STL set doesn't?

Personally, I don't see why at all we should call the copy constructor more than once per element. I mean, if we intend to build a generic data structure, we sure need an internal node object with some extra bytes (internal references and counters) per each element, at least in the case of a red-black tree. So why don't we just bind each element to that internal node once and for all, and then, as long as the node is in the structure, use the data contained in it only by reference? What do we achieve if we choose to use it by value somewhere?

And if the intended design is "use class references", will that create an overhead for garbage collector later on?

...well, thank you for reading it to this point.

-----
Ivan Kazmenko.

Reply via email to