On 10/21/2015 01:05 PM, Andrei Alexandrescu wrote:
I'm finally getting the cycles to get to thinking about Design by
Introspection containers. First off, there are several general
categories of containers. I think D should support all properly. One
question is which we go for in the standard library.
1. Functional containers.
These are immutable; once created, neither their topology nor their
elements may be observably changed. Manipulating a container entails
creating an entire new container, often based on an existing container
(e.g. append takes a container and an element and creates a whole new
container).
Internally, functional containers take advantage of common substructure
and immutability to share actual data. The classic resource for defining
and implementing functional containers is
http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504.
...
I still think those should be mutable by default in order to have
painless interchangeability with other value type containers. Why should
corresponding ephemeral and persistent containers have different interfaces?
I assume you envision code using those to look as follows?
FunSet!int a;
a=a.with_(1);
auto b=a;
a=a.with_(2);
a=a.with_(3);
// b = {1}, a={1,2,3};
I think this should be allowed too:
FunSet!int a;
a.insert(1);
auto b=a;
a.insert(2);
a.insert(3);
// b = {1}, a={1,2,3};
One of the two versions should be automatically implemented via UFCS.
2. Reference containers.
These have classic reference semantics (à la Java). Internally, they may be
implemented either as class objects or as reference counted structs.
They're by default mutable. Qualifiers should apply to them gracefully.
3. Eager value containers.
These are STL-style. Somewhat surprisingly I think these are the worst of the
pack; they expensively duplicate at the drop of a hat and need to be carefully
passed around by reference lest performance silently drops. Nevertheless, when
used as members inside other data structures value semantics might be the
appropriate choice. Also, thinking of them as values often makes code simpler.
By default eager value containers are mutable. They should support immutable
and const meaningfully.
4. Copy-on-write containers.
These combine the advantages of value and reference containers: you get
to think of them as values, yet they're not expensive to copy. Copying
only occurs by necessity upon the first attempt to change them.
...
IMO "1." ought to combine the advantages of value and reference
containers as well, just without any expensive copying at all, even when
updates happen.
The disadvantage is implementations get somewhat complicated. Also, they
are shunned in C++ because there is no proper support for COW; for
example, COW strings have been banned starting with C++11 which is quite
the bummer.
Together with Scott Meyers, Walter figured out a way to change D to
support COW properly. The language change consists of two attributes.
=======
I'll attempt to implement a few versions of each and see what they look
like. The question here is what containers are of interest for D's
standard library.
...
List:
- forward iteration
- bidirectional iteration
Stack:
- basic stack
- ordered stack [0]
Queue:
- basic queue
- heap
Set:
- hash set
- ordered set
- accumulating set [1]
- trie/radix tree
+ multiset versions
Map:
- hash map
- ordered map
- accumulating map [1]
- accumulating map with range update [2]
- trie/radix tree
- accumulating trie [1]
- accumulating trie with range update [2]
+ multimap versions
- array
- accumulating array [1]
- accumulating array with range update [2]
+ O(1) reset versions [3]
- rope
- accumulating rope [1]
- accumulating rope with range update [2]
[0] ordered stack: Push operations automatically pop the minimal number
of elements off the stack prior to pushing, such as to guarantee that
the elements on the stack remain ordered. The stack should expose a
sorted range in order to support binary search.
[1] accumulation: The (ordered!) data structure allows fast queries for
the result of some binary associative operations on the elements in a
certain range. (the allowed operations are determined in advance and
some intermediate results are automatically maintained). (for map, the
operation would be just on the values, not on the keys.) This is usually
quite easy support, but very useful.
[2] range update: here, the idea is that the data structure allows all
elements in a certain range to be updated. the updates are performed
lazily and have to be compatible with the associative operations (if any).
[3] fast reset: here the idea is that the map allows fast reset of its
values at the cost of some small additional overhead per lookup.
(destructors are called lazily.)