On 10/6/10 11:34 AM, Andrei Alexandrescu wrote:
I think ranges are a great thing, having simplicity as one of their
advantages. In the beginning they were indeed rather simple:

struct MyRange {
bool empty();
ref Widget front();
void popFront();
}

with the appropriate refinements for bidirectional and random.

Then there was the need to distinguish one-pass, "input" ranges (e.g.
files) from many-pass, "forward" ranges. So the "save" function got born
for forward ranges and above:

struct MyRange {
bool empty();
ref Widget front();
void popFront();
MyRange save();
}

Then @property came about which required extra adornments:

struct MyRange {
@property bool empty();
@property ref Widget front();
void popFront();
@property MyRange save();
}

Then some ranges were unable to return ref, but they could receive
assignment. I call these /sealed ranges/ because they "seal" the address
of their elements making it inaccessible:

struct MyRange {
@property bool empty();
@property Widget front();
@property void front(Widget);
void popFront();
@property MyRange save();
}

This made bidirectional and random-access range interfaces quite big
because now we're talking about back() (two versions), opIndex(),
opIndexAssign() and opIndexOpAssign().

Until now I think we're solid on design decisions. The discussion starts
here.

And then there was this nagging thing which is well-understood in the
C++ world. A sealed range returns by value and accepts by value. But in
C++, an object's copy constructor is arbitrarily expensive. The library
must be designed in accordance to that and carefully navigate around
that issue.

For example, swap is a fundamental primitive used in many algorithms. It
should swap objects at constant cost. Indeed, for references, swap is
easy to implement:

void swap(T)(ref T lhs, ref T rhs)
{
assert(!pointsTo(lhs, rhs) && !pointsTo(rhs, lhs)
&& !pointsTo(lhs, lhs) && !pointsTo(rhs, rhs));
T tmp = move(lhs);
lhs = move(rhs);
rhs = move(tmp);
}

or similar. However, a sealed range does not offer references, so trying
e.g.

swap(r1.front, r2.front);

will not work. This is a problem.

To solve that problem, I introduced moveFront(), moveBack(), and
moveAt(size_t), all of which destructively read the front, back, or an
indexed element respectively off the range. Then you can swap r1.front
with r2.front at constant cost like this:

T tmp = r1.moveFront();
r1.front = r2.moveFront();
r2.front = move(tmp);

All of this works and is rock-solid, but it does load the range
interface considerably. To a newcomer coming without the background
above, a full-fledged range definition may look quite loaded.

One simplification is to simply decree that Phobos (and D in general)
shuns objects with eager copy. Any this(this) could be considered
constant cost. That would have two consequences:

1. It would simplify all ranges and many algorithms because there's no
more need for moveXxx and swapping can be done the old-fashioned way
(with assignments in inline code):

T tmp = r1.front;
r1.front = r2.front;
r2.front = tmp;

In general many things become considerably easier if you can simply
assume that copying objects around won't be part of your complexity
requirements or the practical costs of your algorithm.

2. It would force certain types (such as BigInt) that allocate resources
and have value semantics to resort to reference counting.

3. It would give more legitimacy to sealed objects that return data by
value (as opposed to reference). I believe sealed objects are very
important for safety.

4. It would be a definite departure from C++, where all value copies are
considered of arbitrary cost. This would provide a convenient straw-man
for naysayers (e.g. "Hey, D calls the copy constructor even more often
than C++! No thanks, I'll stick with C++0x which solves it all with
rvalue references").

5. It would make things look and feel familiar to people coming from any
other languages than C++, who seldom need to define a costly this(this)
at all.

Please discuss.


Andrei

Thanks to all for a very illuminating discussion. The dialog revealed that the topic of cheap copy construction is oddly linked with the topic of garbage collection: garbage collection calls destructors concurrently with other code, which means reference count code must be interlocked, which makes it not-so-cheap.

This is only a manifestation of a deeper problem: destructors can execute arbitrary code, and, if ran in an arbitrary thread, can cause all sorts of deadlocks and essentially render "shared" inoperant.

Walter and I discussed two possibilities this morning:

1. Never call class destructors (or never call them if more than one thread is running)

2. Each destructor should be called in the thread that created the object.

The second solution is quite attractive. It can be implemented by associating each thread with a worklist - a list of objects to destroy. The GC can run in any thread but it will not call destructors. Instead, it adds objects to be destroyed to the threads that created them (this means we need to add a field to each object with the thread ID of the creating thread). When client code calls new, the allocation routine will first inspect the worklist and call the destructors if needed.

I think this can be made to work and has good properties, although a fair amount of details need to be figured out. Please share any thoughts you may have.


Andrei

Reply via email to