Re: ideas about ranges

Steven Schveighoffer Fri, 22 May 2009 12:40:21 -0700

On Fri, 22 May 2009 11:48:34 -0400, Andrei Alexandrescu<[email protected]> wrote:

Steven Schveighoffer wrote:
The thread discussing what to do for input ranges vs. forward rangesgot me thinking.The range concept may be defined backwards in terms of which is morespecialized. Consider that an input range is always usable as astream. But a stream is not easy to use as an input range (the rangeprimitive).
The hierarchy of concepts is:

input range < forward range < bidirectional range < random-access range

Having length, having slicing, and infinity are orthogonal properties.

Right, a stream doesn't fit really well within any of those categories,because it's even more simple than an input range. That was my point.You have to bolt functionality and a buffer on top of it to get it to fitthe model. Maybe this is inevitable, I was wondering if it would beeasier to deal with if the range paradigm had another simpler constructthan input range.

Case in point, a file. To fit into the most primitive range concept,it must define 3 functions:
 front()
popFront()
empty()
 empty is easy, it's just "am I at end of file"
But front is not so easy. In order to know what's at the front, youneed to read it. And at that point you've altered the underlying file.
Not even empty is easy. If you're defining a file range that gives youe.g. whitespace-separated integers, you can't have empty return feof(p)because there might be only spaces through the end of file. So you needto cache one element ahead to be able to implement empty().

Even a stream that does nothing but read straight characters, but has noidea how long the stream is, such as stdin, cannot specify empty withouttrying to read more data. I see your point.

I've discussed this with Walter for the longest time. The correctprimitive for an input range that needs to do no caching is:
Tuple!(T, "data", bool, "got") popNext();
When you call popNext it decides in the same place whether there isdata, and also gives it to you. If "got" comes off as false, you knowyou're done.

Yes, I see now that my model won't cut it. I had a feeling that my modelwasn't quite correct, but as you say, yours still is clunky.


Looking at other iteration models:

C#: similar to D ranges, but moveNext (a.k.a. popFront) is required tostart the enumeration. (yuck!)Java: This is similar to what I was proposing: only has "get next element"and "has more elements".

I'm not sure if the right answer has been found by any of these yet. Ican feel there is a solution that should be simple, and accurately reflectthe underlying objects' API, but I have to think more about it.

What sucks about this is, we have to introduce a buffer in the range,just because we can't look at data until we've popped it. Not onlythat, but calling front a file before anything is read requires a checkto fill the buffer in case we haven't read anything yet. This could bealleviated by filling in the constructor, but it's still more complexthan necessary. Consider also that the underlying stream might alreadybe buffered, so we are buffering a buffer.
Yah, I agree. I've implemented a few of those in Phobos. It would begood to have a more solid solution. This problem, however, also collideswith returning an rvalue versus a ref. A container wants to return aref. An input range wants to return by value. Then it's difficult to usea container as an input range.

don't you mean the other way around? That is, it's easy to convert a refinto a value, but not vice versa.

But since the primitives for input range are set by the compiler (ituses them to do foreach), we have to implement them to make our streamranges friendly to foreach.
 Round peg, meet square hole.
 But what are the true requirements for iteration using foreach?
 1. check if there's anything left
2. get the next element
Step 2 now is split into popFront and front. So a foreach loop is arewritten for loop like this:
 foreach(x; range)
{
  ...
}
 translates to:
{
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.front();
    _r.popFront();
    ...
  }
}
What if step 2 was one function? Call it popNext(), and make itequivalent to calling _r.front() and popFront() in one step on rangesthat implement that method.
This will not solve all problems. It will improve things like definingranges that read one character at a time from a stream. But a functionthat does read-and-check-for-empty in one shot is the true solution.

Yes, agreed. I was focusing on not having to keep around transient datato avoid reordering or caching issues. But it does need to be all-in-oneas you have shown.

How does this work with foreach?
 {
  auto _r = range;
  while(!_r.empty)
  {
    auto x = _r.popNext();
    ...
  }
}
 Basically, the same code, one less line.
Consider that any range defined today with front() and popFront() canimplement popNext (and popNext could be an external function if we canget 3015 resolved).
 So what I think we may need is a different range primitive:
 An iterable range defines: (name to be decided)
 bool empty()
T popNext()
 An input range is an iterable range that also defines:
 T front();
popFront();
I think you are using "iterable range" instead of "input range" and"input range" instead of "forward range". This is compared to STLterminology, which I borrowed.

I don't know what the terminology should be. I was using yourterminology. when I said input range, I meant input range as you definedit on your std.range documentation. An input range has the followingmethods:


T front()
void popFront()
bool empty()

and by composition: T popNext() (this wasn't in the docs, but it'ssomething that we're throwing around here)

An "iterable" range, or one that does not have persistent access toelements and must modify the range and get data at the same time, wassupposed to be a subset of that. I was just defining it as T popNext() andempty(), but it looks like those have to be combined into one function asyou say.

(this might be good for many forward ranges.) Then we nicely ask Walterto allow inlining of such functions, and finally implement this instd.range:
ref ElementType!R popNext(R)(R r, ref bool gotSomething)
     if (isForwardRange!R)
{
     if (r.empty)
     {
         gotSomething = false;
         static typeof(return) dumbo;
         return dumbo;
     }
     auto result = &(r.front());
     r.popFront();
     gotSomething = true;
     return *result;
}
Admittedly this looks considerably messier than it ought to, and that'snever a good sign. For starters, I could predict with accuracy thesneering remarks of certain posters who shall remain unnamed. Worse,they'd have a point (only) this one time :o). Messiness was one of thefactors that made me decide to steer away from this design. A simplersolution is to just return by value:

Yeah, it's ugly. Dealing with orthogonal returns is not a gracefulproposition.

Another idea is to make the T the ref'd arg, similar to how the systemcall read() works:


bool popNext(ref T);

This has some benefits:

1) you aren't checking a temporary for emptyness, so it fits well within aloop construct

2) you can avoid returning a dummy element in the empty range case.

3) you avoid returning a piece of possibly large data on the stack whenit's probably just going to get copied anyways (although the compiler cansometimes optimize this).

One thing this does though, is remove the possibility of returning areference to an element. Not sure if this is a bad thing, since you canalways convert a reference to a value, and when operating in this mode,you are working with the lowest common denominator (as everything has tobe able to implement this).

However, with foreach, you may want to have references sometimes. Forthat, I'd guess that foreach might have to support 3 apis: opApply, range,and stream (popNext API).

I'm not convinced that range is the best fit for ref'd data anyways,opApply is much more adept at it.


How does implementation of popNext look?

bool popNext(R)(R r, ref ElementType!R elem)
     if (isForwardRange!R)
{
     if (r.empty)
         return false;
     elem = r.front();
     r.popFront();
     return true;
}

usage:

R r;
for(ElementType!R x; r.popNext(x);)
{
   // use x
}

vs. your idea for popNext:

R r;
bool done;
// ... geez, I don't even know an easy way to do this???  does this work?
while(ref x = r.popNext(done), done)
{
   // use x
}

Still thinking, may have more ideas... How do we get a reference to theelement...


-Steve

Re: ideas about ranges

Reply via email to