On Tuesday, 25 March 2014 at 20:15:32 UTC, Walter Bright wrote:
It's pretty clear that:

1. the protocol is COMPLETELY undocumented and undefined.

http://dlang.org/phobos/std_range.html#isInputRange

The semantics of an input range (not checkable during compilation) are assumed to be the following (r is an object of type R): r.empty returns false iff there is more data available in the range. r.front returns the current element in the range. It may return by value or by reference. Calling r.front is allowed only if calling r.empty has, or would have, returned false. r.popFront advances to the next element in the range. Calling r.popFront is allowed only if calling r.empty has, or would have, returned false.

We want to appeal to the high performance coders. To maximize performance, ranges should be optimized to the inner loop case, which is:

while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes the assumption that r.front is copy constructible at all. It also makes the assumption that you want to operate on a copy, rather than the actual element in the range.

Finally, it means having to declare a local object: It merely means shifting the burden of caching from one context to another. If the object is large, chances are you are better off just calling front instead of making a copy. Especially if the loop is trivial.

If you want high performance, then arguably, just provide a O(1) front, and a O(1) empty.

Also, certain ranges, such as "filter" *must* access the front of the previous range more than once. Unless you are suggesting we add a field for it to cache the result of the previous range?

With the additional proviso that ranges are passed to algorithms by value, so they should be cheap to copy. Cheap to copy usually means them being small.

Unfortunately yes. That said, any range that does anything will have at least two fields, one of which is a slice, or comparable to in terms of size, so it's going to be big anyways. So you *are* better off passing by ref if you can regardless, unless your range is *really* trivial.

I agree that range sizes can be a problem.

A) I know that my range is not empty, so I can skip calling empty.

Since front is guaranteed to succeed if !empty, this puts a requirement on many ranges that they have a non-trivial constructor that 'primes the pump'. Of course, priming may fail, and so construction may throw, which is not good design.

If the prime fails, then the range can simply be marked as empty. Then if you decide to skip calling empty anyways, it's your own fault.

And lastly, it means that lazy ranges will be required to read the first element, even if the range isn't then subsequently used, which defeats what one would expect from a lazy range.

I'm not yet convinced of adding special code for ranges that aren't used. I've heard of these kinds of ranges, but I've observed that when you declare one, it almost always ends up being used. I don't think we should waste efforts on this rare usecase.

As for "Lazy", in range terms, it mostly only means you calculate things element at once, as you go up the range chain. As opposed to processing the entire input data, one transformation at a time.

Evaluating an element "on use" as opposed to "1 instruction before use" doesn't make much of a change in this context.

All this saves for the user is one call to empty for the entire algorithm, at a cost incurred with every iteration. I.e. it selects O(n) to save O(1).

If code that was actually meant to *do* something was in popFront() to begin with, then there'd be no extra overhead.

I've found that if you are creative enough, you can usually design the range in such a way that it works efficiently, lazilly, and without flags.

Hence, I propose that the protocol for using input ranges be defined as:

while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes it possible to build pipelines that are firehoses with no kinks or constrictions in them. It optimizes for the inner loop case, not boundary cases.

I get where you are coming from, but it's simply not manageable in a generic fashion, if you want to be able to preserve all the power and the diversity of the ranges we have.

The protocol you are suggesting would prevent us from doing a lot of the lovely things that ranges empowers us with.

Reply via email to