Re: protocol for using InputRanges

monarch_dodra Tue, 25 Mar 2014 14:04:29 -0700

On Tuesday, 25 March 2014 at 20:15:32 UTC, Walter Bright wrote:

It's pretty clear that:


1. the protocol is COMPLETELY undocumented and undefined.


http://dlang.org/phobos/std_range.html#isInputRange

The semantics of an input range (not checkable duringcompilation) are assumed to be the following (r is an object oftype R):r.empty returns false iff there is more data available in therange.r.front returns the current element in the range. It may returnby value or by reference. Calling r.front is allowed only ifcalling r.empty has, or would have, returned false.r.popFront advances to the next element in the range. Callingr.popFront is allowed only if calling r.empty has, or would have,returned false.

We want to appeal to the high performance coders. To maximizeperformance, ranges should be optimized to the inner loop case,which is:
while (!r.empty) { auto e = r.front; ... e ...;r.popFront(); }

This makes the assumption that r.front is copy constructible atall. It also makes the assumption that you want to operate on acopy, rather than the actual element in the range.

Finally, it means having to declare a local object: It merelymeans shifting the burden of caching from one context to another.If the object is large, chances are you are better off justcalling front instead of making a copy. Especially if the loop istrivial.

If you want high performance, then arguably, just provide a O(1)front, and a O(1) empty.

Also, certain ranges, such as "filter" *must* access the front ofthe previous range more than once. Unless you are suggesting weadd a field for it to cache the result of the previous range?

With the additional proviso that ranges are passed toalgorithms by value, so they should be cheap to copy. Cheap tocopy usually means them being small.

Unfortunately yes. That said, any range that does anything willhave at least two fields, one of which is a slice, or comparableto in terms of size, so it's going to be big anyways. So you*are* better off passing by ref if you can regardless, unlessyour range is *really* trivial.


I agree that range sizes can be a problem.

A) I know that my range is not empty, so I can skip callingempty.
Since front is guaranteed to succeed if !empty, this puts arequirement on many ranges that they have a non-trivialconstructor that 'primes the pump'. Of course, priming mayfail, and so construction may throw, which is not good design.

If the prime fails, then the range can simply be marked as empty.Then if you decide to skip calling empty anyways, it's your ownfault.

And lastly, it means that lazy ranges will be required to readthe first element, even if the range isn't then subsequentlyused, which defeats what one would expect from a lazy range.

I'm not yet convinced of adding special code for ranges thataren't used. I've heard of these kinds of ranges, but I'veobserved that when you declare one, it almost always ends upbeing used. I don't think we should waste efforts on this rareusecase.

As for "Lazy", in range terms, it mostly only means you calculatethings element at once, as you go up the range chain. As opposedto processing the entire input data, one transformation at a time.

Evaluating an element "on use" as opposed to "1 instructionbefore use" doesn't make much of a change in this context.

All this saves for the user is one call to empty for the entirealgorithm, at a cost incurred with every iteration. I.e. itselects O(n) to save O(1).

If code that was actually meant to *do* something was inpopFront() to begin with, then there'd be no extra overhead.

I've found that if you are creative enough, you can usuallydesign the range in such a way that it works efficiently,lazilly, and without flags.

Hence, I propose that the protocol for using input ranges bedefined as:
while (!r.empty) { auto e = r.front; ... e ...;r.popFront(); }
This makes it possible to build pipelines that are firehoseswith no kinks or constrictions in them. It optimizes for theinner loop case, not boundary cases.

I get where you are coming from, but it's simply not manageablein a generic fashion, if you want to be able to preserve all thepower and the diversity of the ranges we have.

The protocol you are suggesting would prevent us from doing a lotof the lovely things that ranges empowers us with.

Re: protocol for using InputRanges

Reply via email to