I really like that simplified, "yield"-based syntax. However, I believe that it conflicts fundamentally with D's confusing concept of generators.

Most other languages that offer generators (e.g. Python) view the generator as an object that spits out values one-by-one. This is very similar to a co-routine.

Purely for performance reasons, D approaches the issue in a backward way, passing the loop body as a delegate. Indeed, this may be faster than full-fledged co-routines, but it is rather difficult to map a simplified coroutine-like syntax based on 'yield' statements to this very different concept by mere "syntactic sugar". Furthermore, the D concept scales awkwardly to combining multiple generators in one loop or using nested generators.

In fact, there is a simple but powerful way to design purely stack-based generators, sticking to the straightforward concept of co-routines without the overhead of full co-routines and without the need for delegates (inspired by the Sather language):

Make a generator a co-routine but restrict its lifetime to the inside of a loop. When the loop starts, it calls all the contained generators to initialize their local data on the regular stack. When the generators yield, the stack-pointer is *not* reset (as it would be at a return) but kept at its value. The generator routines can therefore be called again within the same loop and find their local data still on the stack. When the loop finishes, the stack pointer is reset to its initial value and all the local data of all the contained generators is immediately released.

This design does not interfere with the regular use of the stack for regular function calls within the loop. Nested loops are possible, as are nested generators or loops that combine values from multiple generators. All with a straightforward syntax and all based on pure stack operations that are no more costly than regular function calls and can be inlined just as easily.

This concept even allows to place calls to generators at arbitrary positions inside a loop, rather than at the beginning only. The regular D loop:
    foreach(auto x;mygenerator) {
        do_something(x);
    }
could be written as something like
    loop {
        do_something(#mygenerator);
    }
where I have blindly introduced a "#" operator to signify a generator call that may either yield a value or break the loop. (I don't care about the exact syntax, but it should stick out somehow.

This can be generalised as
    loop {
        do_something();
        do_something_else(#mygenerator);
        yet_something_else(#anothergenerator);
    };
where each expression containing a # generator call may either yield a value or break the loop.

The clear restriction of this is that "native" generators of this design are not stand-alone objects, cannot survive outside a loop and work only within a single thread. More general generators, however, can easily be implemented as objects that offer a "native" generator as interface.

The only difficulty of this approach is that it has to be done at the assembler level because a unusual calling convention is needed. Typical high-level languages do not allow an emulation of this mechanism. I do not know about byte-code representations.

This whole mechanism is inspired by the Sather languages that first would have allowed it (even though that compiler never actually implemented it this way). To my knowledge, all other languages that offer generators did break this concept by allowing generators to exist beyond the lifetime of a loop and therefore depend on heap-allocated data.

I don't expect that D would ever adopt such a breaking change, even though it would be the most efficient implementation of generators while retaining the conceptually simple definition of co-routines. Still, I find the concept just so beautiful that I have to talk about it once in a while...

Greetings,
Norbert



bearophile wrote:
D currently has two ways to define a generator, the first is the older one with 
opApply (And opApplyReverse):
int opApply(int delegate(ref Type [, ...]) dg);


The syntax of opApply is terrible: hard to remember, bug-prone, noisy, 
intrusive. And this syntax doesn't even gain performance, because dmd is not 
very good at inlining here.
Several times in the past I have said that a very good syntax for this is the 
Python one:

def foo():
  yield 5

It's really easy to use and remember, not bug prone. This possible syntax is 
about 75 times better than the current opApply:

yield(int) foo() {
  yield 5;
}


That's sugar for something like:

struct foo {
    int opApply(int delegate(ref int) dg) {
        int result;
        int r = 5;
        result = dg(r);
        if (result)
            goto END;
      END:
        return result;
    }
}


Python syntax was good enough that C# has copied it.


The other way to create a struct/class generator in D2 is with the Range 
protocol, with the methods that ask for the next item, etc. I have seen that 
such generators sometimes are harder to write, because you have to manage the 
state yourself, but they can be more efficient and they are more powerful.

Both ways are useful, because they are useful for different situations. They 
are one the inverted-out version of the other.


The implementations of generators in C# and Python (and D, but not Lua) suffer 
of a problem, that increases the computational complexity of the code when 
there is a nested generator.

And both Python and C# have found similar solutions that improve both the 
syntax and the performance (both languages have not yet implemented it).

Python version:
http://python.org/dev/peps/pep-0380/

With that this code:

def foo():
  for x in some_iterable:
    yield x


Becomes:

def foo():
  yield from some_iterable


In C# it can be called "yield foreach":

http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html

http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections

http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx


As D2 will start coming out of its embryo state, it too will probably have to 
fece this problem with iterators. In the meantime I hope the opApply syntax 
will be replaced by something better (the yield I've shown).

Bye,
bearophile

Reply via email to