Re: ideas about ranges

Andrei Alexandrescu Fri, 22 May 2009 08:50:14 -0700

Steven Schveighoffer wrote:

The thread discussing what to do for input ranges vs. forward ranges gotme thinking.
The range concept may be defined backwards in terms of which is morespecialized. Consider that an input range is always usable as astream. But a stream is not easy to use as an input range (the rangeprimitive).


The hierarchy of concepts is:

input range < forward range < bidirectional range < random-access range

Having length, having slicing, and infinity are orthogonal properties.

Case in point, a file. To fit into the most primitive range concept, itmust define 3 functions:
front()
popFront()
empty()

empty is easy, it's just "am I at end of file"
But front is not so easy. In order to know what's at the front, youneed to read it. And at that point you've altered the underlying file.

Not even empty is easy. If you're defining a file range that gives youe.g. whitespace-separated integers, you can't have empty return feof(p)because there might be only spaces through the end of file. So you needto cache one element ahead to be able to implement empty().

I've discussed this with Walter for the longest time. The correctprimitive for an input range that needs to do no caching is:


Tuple!(T, "data", bool, "got") popNext();

When you call popNext it decides in the same place whether there isdata, and also gives it to you. If "got" comes off as false, you knowyou're done.

There are two issues with this form. It's not easy to use (even if youplay around with the signature by e.g. passing data or bool byreference) and it's not easy to inline for non-input ranges. The secondproblem could possibly be eliminated, but the first stays. It's just tooclunky a primitive to use, you'll need temporaries and all that stuff.

Look at the implementation of front and popFront for a possibleFileByChar implementation:
dchar front()
{
  if(!bufferValid)
    popFront();
  return buffer;
}

void popFront()
{
// read buffer from source, setting bufferValid if the range isn'tempty by default
}
What sucks about this is, we have to introduce a buffer in the range,just because we can't look at data until we've popped it. Not onlythat, but calling front a file before anything is read requires a checkto fill the buffer in case we haven't read anything yet. This could bealleviated by filling in the constructor, but it's still more complexthan necessary. Consider also that the underlying stream might alreadybe buffered, so we are buffering a buffer.

Yah, I agree. I've implemented a few of those in Phobos. It would begood to have a more solid solution. This problem, however, also collideswith returning an rvalue versus a ref. A container wants to return aref. An input range wants to return by value. Then it's difficult to usea container as an input range.

And finally, if you copy such a range, the buffer might be copied whilethe stream itself may not. this could result in strange garbage data.


I don't understand this. You could make sure copy does the right thing.

But since the primitives for input range are set by the compiler (ituses them to do foreach), we have to implement them to make our streamranges friendly to foreach.
Round peg, meet square hole.

But what are the true requirements for iteration using foreach?

1. check if there's anything left
2. get the next element
Step 2 now is split into popFront and front. So a foreach loop is arewritten for loop like this:
foreach(x; range)
{
  ...
}

translates to:
{
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.front();
    _r.popFront();
    ...
  }
}
What if step 2 was one function? Call it popNext(), and make itequivalent to calling _r.front() and popFront() in one step on rangesthat implement that method.

This will not solve all problems. It will improve things like definingranges that read one character at a time from a stream. But a functionthat does read-and-check-for-empty in one shot is the true solution.

How does this work with foreach?

{
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.popNext();
    ...
  }
}

Basically, the same code, one less line.

Consider that any range defined today with front() and popFront() canimplement popNext (and popNext could be an external function if we canget 3015 resolved).


So what I think we may need is a different range primitive:

An iterable range defines: (name to be decided)

bool empty()
T popNext()

An input range is an iterable range that also defines:

T front();
popFront();

I think you are using "iterable range" instead of "input range" and"input range" instead of "forward range". This is compared to STLterminology, which I borrowed.

Now look at our FileByChar example as an iterable range:

T popNext()
{
   return source.get(); // no buffering required
}

And it works perfectly with the new foreach requirements.

And it correctly doesn't work with algorithms that require front andpopFront.

That's great. Again, popNext integrated with check for empty is the bestsolution. The correct way to go is to define this:


T popNext(ref bool gotSomething);

and leave it to the discretion of the range if they prefer returning byreference:


ref T popNext(ref bool gotSomething);

(this might be good for many forward ranges.) Then we nicely ask Walterto allow inlining of such functions, and finally implement this instd.range:


ref ElementType!R popNext(R)(R r, ref bool gotSomething)
    if (isForwardRange!R)
{
    if (r.empty)
    {
        gotSomething = false;
        static typeof(return) dumbo;
        return dumbo;
    }
    auto result = &(r.front());
    r.popFront();
    gotSomething = true;
    return *result;
}

Admittedly this looks considerably messier than it ought to, and that'snever a good sign. For starters, I could predict with accuracy thesneering remarks of certain posters who shall remain unnamed. Worse,they'd have a point (only) this one time :o). Messiness was one of thefactors that made me decide to steer away from this design. A simplersolution is to just return by value:


ElementType!R popNext(R)(R r, ref bool gotSomething)
    if (isForwardRange!R)
{
    if (r.empty)
    {
        gotSomething = false;
        return typeof(return).init;
    }
    auto result = r.front;
    r.popFront();
    return result;
}

This looks a tad more sane. But then it copies data more than itstrictly should, and for whom? For everything that's not strictly a fileinput, which is most things you want to iterate! Wrong default.

As of this time, I am undecided on what's the best way to go. Opinionsare welcome.



Andrei

Re: ideas about ranges

Reply via email to