std.container.RedBlackTree versus C++ std::set

Ivan Kazmenko Wed, 13 Feb 2013 15:25:18 -0800

Hi!

I'm learning to use D collections properly, and I'm looking for asorted data structure with logarithmic access time (i.e., abinary search tree will do, but a hash table would not help). Asfar as I can see, std.container.RedBlackTree is exactly what Ineed. However, I am not sure if I use it as intended as itsperformance seems inferior to a C++ STL solution (e.g., std::set).

To be more specific, right now I wonder what is the best (orintended) way to store an object in the RedBlackTree: should itbe a class reference, or a struct (passed by value), or somethingquirkier like an integer pointing into an array or a simplepointer. The rest of my program suggested to use structs, but thewhole thing turned out to be rather slow, and the profiler toldme that these structs are being copied around much more than Ianticipated.

And so I wrote a minimalistic test program to check the number ofcopy (postblit) constructor calls. Here is the D version:


-----
import std.container;
import std.stdio;

immutable int LIMIT = 100000;

struct element
{
        static int postblit_counter;

        long x;

        int opCmp (ref element other)
        {
                return (x > other.x) - (x < other.x);
        }

        this (long nx)
        {
                x = nx;
        }

        this (ref this)
        {
                assert (x == x);
                postblit_counter++;
        }
}

alias RedBlackTree !(element) container;

void main ()
{
        auto f = new container ();
        element.postblit_counter = 0;
        foreach (i; 0..LIMIT)
        {
                f.insert (element (i));
        }
        writefln ("%s", element.postblit_counter);
}
-----

And now here is a C++ equivalent:

-----
#include <cstdio>
#include <set>
#include <stdint.h>

const int LIMIT = 100000;

using namespace std;

struct element
{
        static int postblit_counter;

        int64_t x;

        bool operator < (const element other) const
        {
                return (x < other.x);
        }

        element (int64_t nx)
        {
                x = nx;
        }

        element (const element & other)
        {
                postblit_counter++;
        }
};

int element::postblit_counter;

typedef set <element> container;

int main (void)
{
        container f;
        element::postblit_counter = 0;
        for (int i = 0; i < LIMIT; i++)
        {
                f.insert (element (i));
        }
        printf ("%d\n", element::postblit_counter);
        return 0;
}
-----

And the results are:
D2 (DMD 2.059, -O):             11,389,556
C++ (MinGW GCC 4.7.2, -O2):      3,072,387

As you can see, in order to insert 100,000 elements, D needs afew times more copy constructor calls than C++. However, as faras I know, the internal structure used by C++ std::set is thevery same red-black tree! Am I doing something wrong? And if not,what is this cost paid for, are there any features thatRedBlackTree possesses and STL set doesn't?

Personally, I don't see why at all we should call the copyconstructor more than once per element. I mean, if we intend tobuild a generic data structure, we sure need an internal nodeobject with some extra bytes (internal references and counters)per each element, at least in the case of a red-black tree. Sowhy don't we just bind each element to that internal node onceand for all, and then, as long as the node is in the structure,use the data contained in it only by reference? What do weachieve if we choose to use it by value somewhere?

And if the intended design is "use class references", will thatcreate an overhead for garbage collector later on?


...well, thank you for reading it to this point.

-----
Ivan Kazmenko.

std.container.RedBlackTree versus C++ std::set

Reply via email to