Re: protocol for using InputRanges

Walter Bright Tue, 25 Mar 2014 14:46:39 -0700

On 3/25/2014 1:56 PM, monarch_dodra wrote:

http://dlang.org/phobos/std_range.html#isInputRange


The semantics of an input range (not checkable during compilation) are assumed
to be the following (r is an object of type R):
r.empty returns false iff there is more data available in the range.
r.front returns the current element in the range. It may return by value or by
reference. Calling r.front is allowed only if calling r.empty has, or would
have, returned false.
r.popFront advances to the next element in the range. Calling r.popFront is
allowed only if calling r.empty has, or would have, returned false.


I overlooked that. Thanks.

We want to appeal to the high performance coders. To maximize performance,
ranges should be optimized to the inner loop case, which is:

    while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }


This makes the assumption that r.front is copy constructible at all.
It also
makes the assumption that you want to operate on a copy, rather than the actual
element in the range.

It's a reasonable requirement. If your range has an issue with this, it canreturn a pointer to the element, and the element can be a struct with accessfunctions. Then, the pointer will work as well as a copy.

Finally, it means having to declare a local object: It merely means shifting the
burden of caching from one context to another. If the object is large, chances
are you are better off just calling front instead of making a copy. Especially
if the loop is trivial.

This does come up as an issue, and is solvable by returning a pointer as Idescribed. It's up to the designer of the range.

If you want high performance, then arguably, just provide a O(1) front, and a
O(1) empty.

I don't think the issues can be waved away so easily, or I wouldn't have broughtthis up.

Also, certain ranges, such as "filter" *must* access the front of the previous
range more than once.
Unless you are suggesting we add a field for it to cache
the result of the previous range?

This is putting the cost where it belongs - when needed on the user of a range,rather than on all ranges. It's the "pay only for what you need" idea ratherthan "pay regardless".

With the additional proviso that ranges are passed to algorithms by value, so
they should be cheap to copy. Cheap to copy usually means them being small.

Unfortunately yes. That said, any range that does anything will have at least
two fields, one of which is a slice, or comparable to in terms of size, so it's
going to be big anyways. So you *are* better off passing by ref if you can
regardless, unless your range is *really* trivial.

Many very useful ranges are trivial. Or at least they should be. An array, forexample, is a trivial range.

A) I know that my range is not empty, so I can skip calling empty.

Since front is guaranteed to succeed if !empty, this puts a requirement on
many ranges that they have a non-trivial constructor that 'primes the pump'.
Of course, priming may fail, and so construction may throw, which is not good
design.


If the prime fails, then the range can simply be marked as empty. Then if you
decide to skip calling empty anyways, it's your own fault.

Yes, one can add state flags to indicate failed construction, which I'll argueis an ugly design. After all, construction is supposed to construct an object orfail, not leave the 'constructed' object in a zombie state.

And lastly, it means that lazy ranges will be required to read the first
element, even if the range isn't then subsequently used, which defeats what
one would expect from a lazy range.


I'm not yet convinced of adding special code for ranges that aren't used. I've
heard of these kinds of ranges, but I've observed that when you declare one, it
almost always ends up being used. I don't think we should waste efforts on this
rare usecase.

It's not rare - it's the primary way ranges are used in C# Linq. Should we throwthis entire category of use cases under the bus just to handle a convenience ofnot needing to call empty in some cases?

Evaluating an element "on use" as opposed to "1 instruction before use" doesn't
make much of a change in this context.

Except that it requires the use to start upon construction, which defeats anyhope of separating construction of a pipeline from using it.

I've found that if you are creative enough, you can usually design the range in
such a way that it works efficiently, lazilly, and without flags.


That's not been my experience.

I get where you are coming from, but it's simply not manageable in a generic
fashion, if you want to be able to preserve all the power and the diversity of
the ranges we have.

The protocol you are suggesting would prevent us from doing a lot of the lovely
things that ranges empowers us with.

Please show me such a case. Note that I've shown above that this power anddiversity throws an entire use case category under the bus. Secondly, in myexperiments with ranges, the power and diversity results in slower pipelines.

Re: protocol for using InputRanges

Reply via email to