On Fri, 22 May 2009 11:48:34 -0400, Andrei Alexandrescu <[email protected]> wrote:

Steven Schveighoffer wrote:
The thread discussing what to do for input ranges vs. forward ranges got me thinking. The range concept may be defined backwards in terms of which is more specialized. Consider that an input range is always usable as a stream. But a stream is not easy to use as an input range (the range primitive).

The hierarchy of concepts is:

input range < forward range < bidirectional range < random-access range

Having length, having slicing, and infinity are orthogonal properties.

Right, a stream doesn't fit really well within any of those categories, because it's even more simple than an input range. That was my point. You have to bolt functionality and a buffer on top of it to get it to fit the model. Maybe this is inevitable, I was wondering if it would be easier to deal with if the range paradigm had another simpler construct than input range.

Case in point, a file. To fit into the most primitive range concept, it must define 3 functions:
 front()
popFront()
empty()
 empty is easy, it's just "am I at end of file"
But front is not so easy. In order to know what's at the front, you need to read it. And at that point you've altered the underlying file.

Not even empty is easy. If you're defining a file range that gives you e.g. whitespace-separated integers, you can't have empty return feof(p) because there might be only spaces through the end of file. So you need to cache one element ahead to be able to implement empty().

Even a stream that does nothing but read straight characters, but has no idea how long the stream is, such as stdin, cannot specify empty without trying to read more data. I see your point.


I've discussed this with Walter for the longest time. The correct primitive for an input range that needs to do no caching is:

Tuple!(T, "data", bool, "got") popNext();

When you call popNext it decides in the same place whether there is data, and also gives it to you. If "got" comes off as false, you know you're done.

Yes, I see now that my model won't cut it. I had a feeling that my model wasn't quite correct, but as you say, yours still is clunky.

Looking at other iteration models:

C#: similar to D ranges, but moveNext (a.k.a. popFront) is required to start the enumeration. (yuck!) Java: This is similar to what I was proposing: only has "get next element" and "has more elements".

I'm not sure if the right answer has been found by any of these yet. I can feel there is a solution that should be simple, and accurately reflect the underlying objects' API, but I have to think more about it.

What sucks about this is, we have to introduce a buffer in the range, just because we can't look at data until we've popped it. Not only that, but calling front a file before anything is read requires a check to fill the buffer in case we haven't read anything yet. This could be alleviated by filling in the constructor, but it's still more complex than necessary. Consider also that the underlying stream might already be buffered, so we are buffering a buffer.

Yah, I agree. I've implemented a few of those in Phobos. It would be good to have a more solid solution. This problem, however, also collides with returning an rvalue versus a ref. A container wants to return a ref. An input range wants to return by value. Then it's difficult to use a container as an input range.

don't you mean the other way around? That is, it's easy to convert a ref into a value, but not vice versa.

But since the primitives for input range are set by the compiler (it uses them to do foreach), we have to implement them to make our stream ranges friendly to foreach.
 Round peg, meet square hole.
 But what are the true requirements for iteration using foreach?
 1. check if there's anything left
2. get the next element
Step 2 now is split into popFront and front. So a foreach loop is a rewritten for loop like this:
 foreach(x; range)
{
  ...
}
 translates to:
{
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.front();
    _r.popFront();
    ...
  }
}
What if step 2 was one function? Call it popNext(), and make it equivalent to calling _r.front() and popFront() in one step on ranges that implement that method.

This will not solve all problems. It will improve things like defining ranges that read one character at a time from a stream. But a function that does read-and-check-for-empty in one shot is the true solution.

Yes, agreed. I was focusing on not having to keep around transient data to avoid reordering or caching issues. But it does need to be all-in-one as you have shown.

How does this work with foreach?
 {
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.popNext();
    ...
  }
}
 Basically, the same code, one less line.
Consider that any range defined today with front() and popFront() can implement popNext (and popNext could be an external function if we can get 3015 resolved).
 So what I think we may need is a different range primitive:
 An iterable range defines: (name to be decided)
 bool empty()
T popNext()
 An input range is an iterable range that also defines:
 T front();
popFront();

I think you are using "iterable range" instead of "input range" and "input range" instead of "forward range". This is compared to STL terminology, which I borrowed.

I don't know what the terminology should be. I was using your terminology. when I said input range, I meant input range as you defined it on your std.range documentation. An input range has the following methods:

T front()
void popFront()
bool empty()
and by composition: T popNext() (this wasn't in the docs, but it's something that we're throwing around here)

An "iterable" range, or one that does not have persistent access to elements and must modify the range and get data at the same time, was supposed to be a subset of that. I was just defining it as T popNext() and empty(), but it looks like those have to be combined into one function as you say.

(this might be good for many forward ranges.) Then we nicely ask Walter to allow inlining of such functions, and finally implement this in std.range:

ref ElementType!R popNext(R)(R r, ref bool gotSomething)
     if (isForwardRange!R)
{
     if (r.empty)
     {
         gotSomething = false;
         static typeof(return) dumbo;
         return dumbo;
     }
     auto result = &(r.front());
     r.popFront();
     gotSomething = true;
     return *result;
}

Admittedly this looks considerably messier than it ought to, and that's never a good sign. For starters, I could predict with accuracy the sneering remarks of certain posters who shall remain unnamed. Worse, they'd have a point (only) this one time :o). Messiness was one of the factors that made me decide to steer away from this design. A simpler solution is to just return by value:

Yeah, it's ugly. Dealing with orthogonal returns is not a graceful proposition.

Another idea is to make the T the ref'd arg, similar to how the system call read() works:

bool popNext(ref T);

This has some benefits:

1) you aren't checking a temporary for emptyness, so it fits well within a loop construct
2) you can avoid returning a dummy element in the empty range case.
3) you avoid returning a piece of possibly large data on the stack when it's probably just going to get copied anyways (although the compiler can sometimes optimize this).

One thing this does though, is remove the possibility of returning a reference to an element. Not sure if this is a bad thing, since you can always convert a reference to a value, and when operating in this mode, you are working with the lowest common denominator (as everything has to be able to implement this).

However, with foreach, you may want to have references sometimes. For that, I'd guess that foreach might have to support 3 apis: opApply, range, and stream (popNext API).

I'm not convinced that range is the best fit for ref'd data anyways, opApply is much more adept at it.

How does implementation of popNext look?

bool popNext(R)(R r, ref ElementType!R elem)
     if (isForwardRange!R)
{
     if (r.empty)
         return false;
     elem = r.front();
     r.popFront();
     return true;
}

usage:

R r;
for(ElementType!R x; r.popNext(x);)
{
   // use x
}

vs. your idea for popNext:

R r;
bool done;
// ... geez, I don't even know an easy way to do this???  does this work?
while(ref x = r.popNext(done), done)
{
   // use x
}

Still thinking, may have more ideas... How do we get a reference to the element...

-Steve

Reply via email to