Purely stack-based generators

Norbert Nemec Thu, 18 Mar 2010 04:35:51 -0700

I really like that simplified, "yield"-based syntax. However, I believethat it conflicts fundamentally with D's confusing concept of generators.

Most other languages that offer generators (e.g. Python) view thegenerator as an object that spits out values one-by-one. This is verysimilar to a co-routine.

Purely for performance reasons, D approaches the issue in a backwardway, passing the loop body as a delegate. Indeed, this may be fasterthan full-fledged co-routines, but it is rather difficult to map asimplified coroutine-like syntax based on 'yield' statements to thisvery different concept by mere "syntactic sugar". Furthermore, the Dconcept scales awkwardly to combining multiple generators in one loop orusing nested generators.

In fact, there is a simple but powerful way to design purely stack-basedgenerators, sticking to the straightforward concept of co-routineswithout the overhead of full co-routines and without the need fordelegates (inspired by the Sather language):

Make a generator a co-routine but restrict its lifetime to the inside ofa loop. When the loop starts, it calls all the contained generators toinitialize their local data on the regular stack. When the generatorsyield, the stack-pointer is *not* reset (as it would be at a return) butkept at its value. The generator routines can therefore be called againwithin the same loop and find their local data still on the stack. Whenthe loop finishes, the stack pointer is reset to its initial value andall the local data of all the contained generators is immediately released.

This design does not interfere with the regular use of the stack forregular function calls within the loop. Nested loops are possible, asare nested generators or loops that combine values from multiplegenerators. All with a straightforward syntax and all based on purestack operations that are no more costly than regular function calls andcan be inlined just as easily.

This concept even allows to place calls to generators at arbitrarypositions inside a loop, rather than at the beginning only. The regularD loop:

    foreach(auto x;mygenerator) {
        do_something(x);
    }
could be written as something like
    loop {
        do_something(#mygenerator);
    }

where I have blindly introduced a "#" operator to signify a generatorcall that may either yield a value or break the loop. (I don't careabout the exact syntax, but it should stick out somehow.


This can be generalised as
    loop {
        do_something();
        do_something_else(#mygenerator);
        yet_something_else(#anothergenerator);
    };

where each expression containing a # generator call may either yield avalue or break the loop.

The clear restriction of this is that "native" generators of this designare not stand-alone objects, cannot survive outside a loop and work onlywithin a single thread. More general generators, however, can easily beimplemented as objects that offer a "native" generator as interface.

The only difficulty of this approach is that it has to be done at theassembler level because a unusual calling convention is needed. Typicalhigh-level languages do not allow an emulation of this mechanism. I donot know about byte-code representations.

This whole mechanism is inspired by the Sather languages that firstwould have allowed it (even though that compiler never actuallyimplemented it this way). To my knowledge, all other languages thatoffer generators did break this concept by allowing generators to existbeyond the lifetime of a loop and therefore depend on heap-allocated data.

I don't expect that D would ever adopt such a breaking change, eventhough it would be the most efficient implementation of generators whileretaining the conceptually simple definition of co-routines. Still, Ifind the concept just so beautiful that I have to talk about it once ina while...


Greetings,
Norbert



bearophile wrote:

D currently has two ways to define a generator, the first is the older one with 
opApply (And opApplyReverse):
int opApply(int delegate(ref Type [, ...]) dg);


The syntax of opApply is terrible: hard to remember, bug-prone, noisy, 
intrusive. And this syntax doesn't even gain performance, because dmd is not 
very good at inlining here.
Several times in the past I have said that a very good syntax for this is the 
Python one:

def foo():
  yield 5

It's really easy to use and remember, not bug prone. This possible syntax is 
about 75 times better than the current opApply:

yield(int) foo() {
  yield 5;
}


That's sugar for something like:

struct foo {
    int opApply(int delegate(ref int) dg) {
        int result;
        int r = 5;
        result = dg(r);
        if (result)
            goto END;
      END:
        return result;
    }
}


Python syntax was good enough that C# has copied it.


The other way to create a struct/class generator in D2 is with the Range 
protocol, with the methods that ask for the next item, etc. I have seen that 
such generators sometimes are harder to write, because you have to manage the 
state yourself, but they can be more efficient and they are more powerful.

Both ways are useful, because they are useful for different situations. They 
are one the inverted-out version of the other.


The implementations of generators in C# and Python (and D, but not Lua) suffer 
of a problem, that increases the computational complexity of the code when 
there is a nested generator.

And both Python and C# have found similar solutions that improve both the 
syntax and the performance (both languages have not yet implemented it).

Python version:
http://python.org/dev/peps/pep-0380/

With that this code:

def foo():
  for x in some_iterable:
    yield x


Becomes:

def foo():
  yield from some_iterable


In C# it can be called "yield foreach":

http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html

http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections

http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx


As D2 will start coming out of its embryo state, it too will probably have to 
fece this problem with iterators. In the meantime I hope the opApply syntax 
will be replaced by something better (the yield I've shown).

Bye,
bearophile

Purely stack-based generators

Reply via email to