Steven Schveighoffer wrote:

The thread discussing what to do for input ranges vs. forward ranges got me thinking.

The range concept may be defined backwards in terms of which is more specialized. Consider that an input range is always usable as a stream. But a stream is not easy to use as an input range (the range primitive).

The hierarchy of concepts is:

input range < forward range < bidirectional range < random-access range

Having length, having slicing, and infinity are orthogonal properties.

Case in point, a file. To fit into the most primitive range concept, it must define 3 functions:

front()
popFront()
empty()

empty is easy, it's just "am I at end of file"

But front is not so easy. In order to know what's at the front, you need to read it. And at that point you've altered the underlying file.

Not even empty is easy. If you're defining a file range that gives you e.g. whitespace-separated integers, you can't have empty return feof(p) because there might be only spaces through the end of file. So you need to cache one element ahead to be able to implement empty().

I've discussed this with Walter for the longest time. The correct primitive for an input range that needs to do no caching is:

Tuple!(T, "data", bool, "got") popNext();

When you call popNext it decides in the same place whether there is data, and also gives it to you. If "got" comes off as false, you know you're done.

There are two issues with this form. It's not easy to use (even if you play around with the signature by e.g. passing data or bool by reference) and it's not easy to inline for non-input ranges. The second problem could possibly be eliminated, but the first stays. It's just too clunky a primitive to use, you'll need temporaries and all that stuff.

Look at the implementation of front and popFront for a possible FileByChar implementation:

dchar front()
{
  if(!bufferValid)
    popFront();
  return buffer;
}

void popFront()
{
// read buffer from source, setting bufferValid if the range isn't empty by default
}

What sucks about this is, we have to introduce a buffer in the range, just because we can't look at data until we've popped it. Not only that, but calling front a file before anything is read requires a check to fill the buffer in case we haven't read anything yet. This could be alleviated by filling in the constructor, but it's still more complex than necessary. Consider also that the underlying stream might already be buffered, so we are buffering a buffer.

Yah, I agree. I've implemented a few of those in Phobos. It would be good to have a more solid solution. This problem, however, also collides with returning an rvalue versus a ref. A container wants to return a ref. An input range wants to return by value. Then it's difficult to use a container as an input range.

And finally, if you copy such a range, the buffer might be copied while the stream itself may not. this could result in strange garbage data.

I don't understand this. You could make sure copy does the right thing.

But since the primitives for input range are set by the compiler (it uses them to do foreach), we have to implement them to make our stream ranges friendly to foreach.

Round peg, meet square hole.

But what are the true requirements for iteration using foreach?

1. check if there's anything left
2. get the next element

Step 2 now is split into popFront and front. So a foreach loop is a rewritten for loop like this:

foreach(x; range)
{
  ...
}

translates to:
{
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.front();
    _r.popFront();
    ...
  }
}

What if step 2 was one function? Call it popNext(), and make it equivalent to calling _r.front() and popFront() in one step on ranges that implement that method.

This will not solve all problems. It will improve things like defining ranges that read one character at a time from a stream. But a function that does read-and-check-for-empty in one shot is the true solution.

How does this work with foreach?

{
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.popNext();
    ...
  }
}

Basically, the same code, one less line.

Consider that any range defined today with front() and popFront() can implement popNext (and popNext could be an external function if we can get 3015 resolved).

So what I think we may need is a different range primitive:

An iterable range defines: (name to be decided)

bool empty()
T popNext()

An input range is an iterable range that also defines:

T front();
popFront();

I think you are using "iterable range" instead of "input range" and "input range" instead of "forward range". This is compared to STL terminology, which I borrowed.

Now look at our FileByChar example as an iterable range:

T popNext()
{
   return source.get(); // no buffering required
}

And it works perfectly with the new foreach requirements.

And it correctly doesn't work with algorithms that require front and popFront.

That's great. Again, popNext integrated with check for empty is the best solution. The correct way to go is to define this:

T popNext(ref bool gotSomething);

and leave it to the discretion of the range if they prefer returning by reference:

ref T popNext(ref bool gotSomething);

(this might be good for many forward ranges.) Then we nicely ask Walter to allow inlining of such functions, and finally implement this in std.range:

ref ElementType!R popNext(R)(R r, ref bool gotSomething)
    if (isForwardRange!R)
{
    if (r.empty)
    {
        gotSomething = false;
        static typeof(return) dumbo;
        return dumbo;
    }
    auto result = &(r.front());
    r.popFront();
    gotSomething = true;
    return *result;
}

Admittedly this looks considerably messier than it ought to, and that's never a good sign. For starters, I could predict with accuracy the sneering remarks of certain posters who shall remain unnamed. Worse, they'd have a point (only) this one time :o). Messiness was one of the factors that made me decide to steer away from this design. A simpler solution is to just return by value:

ElementType!R popNext(R)(R r, ref bool gotSomething)
    if (isForwardRange!R)
{
    if (r.empty)
    {
        gotSomething = false;
        return typeof(return).init;
    }
    auto result = r.front;
    r.popFront();
    return result;
}

This looks a tad more sane. But then it copies data more than it strictly should, and for whom? For everything that's not strictly a file input, which is most things you want to iterate! Wrong default.

As of this time, I am undecided on what's the best way to go. Opinions are welcome.


Andrei

Reply via email to