Re: output ranges: by ref or by value?

Andrei Alexandrescu Sun, 03 Jan 2010 06:31:54 -0800

Steven Schveighoffer wrote:

On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu<[email protected]> wrote:
Steven Schveighoffer wrote:
My theory is, given this list of ranges, if you pair them with analgorithm that requires save capability, you wouldn't want to usethat algorithm on it anyways (kinda like the consume example).
Why gratuitously limit the design? You're asking to replace this:

R save() { return this; }

with:

enum thisIsAForwardRange = true;
Is there a reason? The former leaves in flexibility. The latterdoesn't, for no good reason.
Well, one thing you could do is:

enum thisIsAnInputRange = true;
and then no special implementation is needed for normal forward ranges.the other point is there is no special treatment needed insidealgorithms -- the risk of forgetting to use save at the right points ofthe algorithm is higher than forgetting to say isForwardRange!(R) at thebeginning of the function.

isForwardRange will be defined to yield true if and only if the rangedefines save. But I see you point - user code only assertsisForwardRange and then does not bother to use save(), just copies stuffaround in confidence that copying does the right thing.

Thanks for this insight. I don't see how to reconcile that with classranges and covariance.

Not having opSlice be part of the interface itself does not preclude itfrom implementing opSlice, and does not preclude using ranges of it instd.algorithm. If I'm not mistaken, all functions in std.algorithm relyon compile time interfaces. opApply allows for full input rangefunctionality for things like copying and outputting where templates maynot be desired.

The point is how much container-independent code can someone write byusing the Container interface. If all you have is a Container, you can'tuse it with any range algorithm.

BTW, the primitives in dcollections are:
 clear(); // clear all elements
remove(V v); // remove an element
Search and remove? That's an odd primitive. Why wouldn't you offer aninterface for iteration that allows an algorithm for search, and aprimitive for positioned removal?
Search and positioned removal are also a primitives, but not defined onthe interface. remove was a primitive on Tango's containers, anddcollections was originally meant to be a replacement for Tango'scontainers.
I think the point is, if you have an interface reference, what would bethe minimum functionality needed so that you could use a containerwithout knowing its implementation.

Yes, and I think remove(V v) does not belong to the minimumfunctionality. It combines two functions (search and remove) and raisesquestions such as what happens to duplicate elements.

contains(V v); // returns whether an element is contained in thecolleciton
I don't think this belongs to primitives. It's O(n) for manycontainers and again it's a generic algorithm, not a member.
Again, it's part of the minimal usable interface. It's not a genericalgorithm, because some containers can implement this more efficiently.

But this is exactly what I believe to be a mistake: you are abstractingaway algorithmic complexity.

Plus, to use the generic algorithms, you would need to use interfaces asranges which I think are completely useless.


Why?

length(); // the length of the collection
That's not a primitive either. std.algorithm.walkLength is. For me,all sorts of red flags and alarm buzzers go off when primitives areguaranteed that can't be implemented efficiently but by a subset ofcontainers. You can't discount complexity as a implementation detail.
All current dcollection containers have O(1) length.


Some containers can't define O(1) length conveniently.

dup(); // duplicate the collection
opApply(int delegate(ref V v) dg); // iterate the collection
opApply(int delegate(ref bool doPurge, ref V v) dg); // purge thecollectionThat means it covers only empty in your list of must-have functions(via length() == 0).
How do you implement length() for a singly-linked list? Is empty()going to take O(n)?
first, dcollections' list implementation is doubly linked because allcollections are forward and backward iterable.
second, even for singly linked lists, you can have either O(1) length orO(1) splicing (consuming a link list range into another linked list).Dcollections' default link list implementation uses O(1) length since Ithink splicing is a specialized requirement.

Right. The question is how much pressure Container is putting on theimplementation. I'd rather leave it to the list implementation to decideto store the length or not.

Add is not a primitive because the Map collections shouldn't assignsome random key to the new element. removeAny is defined only onsets and multisets, but I'm not sure that it couldn't be moved toCollection, in fact, I'll probably do that.
add is a primitive that takes Tuple!(K, V) as the element type.
How do you define that on Container(V)? on Map(K, V), set(K k, V v) isan interface method.


Map!(K, V) has Tuple!(K, V) as its element type.

what you can do is define Map(K, V) as inheriting Container(Tuple!(K,V)), but then trying to use the container functions are verycumbersome. In dcollections, Map(K, V) inherits Collection(V).
Note that it's missing begin and end which are defined on everysingle container type (i.e. the equivalent of the all-elementsrange). This is because those primitives return a struct that isdifferent for every container type.
So you can't write container-independent iteration code unless you useopApply, in which case composition becomes tenuous.
No, you can easily write container-independent iteration as long as youhave the implementation.

In this context: container-independent = using the Container interface.This is the whole purpose of creating a container hierarchy. If thecontainer design fosters knowing the implementation, maybe a classhierarchy is not the right choice in the first place.

If you use interfaces you can write opApply wrappers to do differentthings. I'm not sure what you mean by composition.


For example, compose ranges a la retro or take.

It also surpasses opSlice via opApply, since all an input range cando anyways is iterate. In fact, opApply is more powerful because youcan change elements and remove elements (via purging). Plus it'smore efficient than a range-via-interface.
An input range is a subset of other (more powerful) ranges. It's alsomuch more flexible. I agree that calling range primitives viainterfaces can become an efficiency liability.
How is it more flexible? You can't replace data, and you can't removedata while iterating, both of which are possible with dcollection'sprimitives. If you have a Container(E) which defines InputRange!EopSlice, how do you get at the more defined range definition? casting?

You can replace data by assigning to range's elements. Removal is donevia positioned remove (which I think needs to be a primitive).

I see a range as being useful for iteration or algorithms, but notfor general use. A great example is AAs. Would you say that an AA*is* a range or should *provide* a range? If it is a range, doesthat mean you remove elements as you call popFront? Does that makeany sense? If it doesn't, then what happens if you add elementsthrough another alias to that AA?
An AA provides several ranges - among which byKey and byValue.
I misunderstood your statement "[a container hierarchy] does needrange interfaces." I thought you meant containers themselves need toimplement the range interface, I see now that isn't the case, so my bad.
Yah, they'd offer it as a result of opSlice(). Covariant return typeswill ensure there's no virtual call when you know what container youoperate on.
Not having opSlice on the interface guarantees you never have a virtualcall for iteration :) opApply mitigates the virtual call on the interface.

And takes away the ability to compose ranges and to use algorithms withthe container.

Above all: the primitive set for a container must be a small set offunctions that (a) can be implemented by all containers withinreasonable efficiency bounds, and (b) are container-specific, notgeneric. IMHO any container design that defines a search(Element) as aprimitive is misguided. Searching is not a container primitive - it'san algorithm. Only more specialized containers (e.g. trees, hashesetc.) can afford to define search(Element) as a primitive. Linearsearch is a generic algorithm that works the same for everyone. Itdoes not belong as a method of any container.
If the minimal container design isn't usable without std.algorithm, thenI don't think it's worth having interfaces.


Why?

I think the other way: if the minimal container design is unusable withstd.algorithm, the design took a wrong turn somewhere.

If you need std.algorithm,you need the full implementation of the container because it's acompile-time interface.

How did you reach that conclusion? std.algorithm uses a syntacticinterface that is obeyed by interfaces too. There's no problem.

Interface ranges are something that should beavoided, it's like having a programming language where everything has tobe a class.

I disagree. The negation of an implementation dogma can be just aslimiting as the dogma. The way I see it, a design defines somedesiderata. Then it does whatever it takes to fulfill them.

If one desideratum is to use a class hierachy to writecontainer-independent code, then interface ranges naturally follow.There's no ifs and buts about it.

What you are saying seems completely incorrect to me: "since not allcontainers can implement fast search, I'm going to guarantee that *all*containers use a linear search via their interface.

This is a misunderstanding. In the STL linear containers don't definefind(), but associative containers do. That is the correct design.

*AND* I want tomake each loop in the search algorithm call 3 virtual functions!"

Not necessarily. This happens only if you use the Container interface towrite container-independent code. It is the cost it takes for the designto fulfill its desiderata.

Howis that better than a search function that guarantees linear performancebut gives the option of being as fast as possible with no loop virtualcalls?

It is better because it doesn't write off search complexity as animplementation detail. "Linear or better" is not a good guarantee. Agood guarantee is "From this node of the hierarchy down, this primitiveis defined to guarantee O(log n) search". Linear search is a genericalgorithm and does not belong to any container.



Andrei

Re: output ranges: by ref or by value?

Reply via email to