1. Exceptions would be a slow (in the current implementation) and messy way to signal the end of an iteration. Exceptions should be used for things that aren't likely to happen, and this is something likely. You could base your design for next on the assoc protocol. at* looks up the value corresponding to a key in an assoc. If the key is present, it returns the value along with the flag t. If the key is absent, it returns f for the value and f for the flag. Often, code will call the word 'at' rather than 'at*', which is : at at* drop ;--just returning the value, or f if the value isn't there. But this only works if you expect that at* shouldn't return f for the value, or if you want to equate a value of f with a missing key, or you know the key will be there, etc. So you could have next* and next.
Alternatively, you could have a function empty? which tests if the iterator has been exhausted, together with a next-unsafe function that increments the iterator and is allowed to create an invalid iterator if it's already done. Out of these primitives, you could build next (or next*) to safely increment the iterator, either throwing an exception if it's done (if we don't think anyone's likely to call next on purpose on an exhausted iterator) or return a flag about whether it's done (if we think people are likely to call it on purpose in that case). If you go with a generic protocol for iterators (which I would recommend for performance reasons; I wrote about this below), then you should choose the base functions by what's easiest to implement, and then expose a set of functions on top of that which turn out to be the most useful. Probably, the situation where you call 'next' directly at all is rare, compared to situations where you want to call 'each'. 2. Using coroutines for this is very costly and probably will always be more expensive than writing code in a direct style. Saving and restoring contexts has some overhead, and it's harder for the compiler to reason about it. How would you be able to inline the computation, even if the coroutine is known? There's probably a way, but I don't know of any compiler that does this. It must have a cost in Python too, but maybe the penalty isn't so great because your Python code will already be slow for other reasons. In general: I like your idea of making a set of generic words for iterators. There could be several classes that implement the iterator words: sequences, coroutine-based generators, linked lists, deques, hashtables, etc. This would be a bit of code to implement each of them, a little more than if you went with coroutines because you need an explicit representation of the cursor. But if a system like that existed, you could write code that iterates over any type of object. You could implement 'each', for example, in terms of iterators, and if 'like' were part of the protocol, you could implement 'map'. It would be great to replace the sequence protocol with this. Then, many of the sequence words wouldn't be tied to operating on random-access structures. Code duplication in the standard library could be reduced. The big challenge is doing this without performance regressions. The code should be expressed in a way that the compiler can optimize out the overhead of using iterator objects rather that doing the iteration directly. This is impossible for coroutines (at least with the compiler we have right now), but it could be possible for iterators over sequences, if enough type information is known by the compiler. Maybe you want to make these iterators functional. That is, maybe the next operation should return a new iterator that stands for the new location, without modifying the old iterator object. One advantage of this is that the compiler understands how to do escape analysis on immutable tuples. This is the only way I could see iterators being implemented that wouldn't be a performance regression for sequence operations. A disadvantage is that, if escape analysis can't be done, it allocates more memory. Dan On Sat, Oct 31, 2009 at 4:46 PM, Paul Moore <[email protected]> wrote: > I'm playing with the idea of iterators and generators from Python. > Basically, an iterator is an object that supports a single operation, > next, which produces the iterator's next value. Iterators can be > created from sequences in the obvious way by iterating through the > items. > > Python also supports generators, which are a limited form of coroutine: > > def gen(): > yield 1 > yield 2 > yield 3 > > defines a generator function which, when called, returns an iterator > which produces the values from each yield statement in turn. > > There are two questions which come to mind while trying to design these: > > 1. What's the most idiomatic way in Factor to implement next? In > Python, it returns a new value each time it's called, and signals that > it has run out of values by raising a special StopIteration exception. > This has the major advantage that iterators don't have to generate a > value until it's needed - there's no empty? function which needs to > know whether next will work the next time you call it - just call it > and catch the exception. Is using an exception for control flow like > this a reasonable option in Factor? An alternative would be to return > an extra flag - but I suspect that will make stack handling a lot more > messy. > > 2. The obvious way of implementing generators in Factor would be on > top of continuations/coroutines. But one of the reasons Python's > generators are limited is that they can be efficient (full coroutines > are a lot more costly than generators). Which makes me wonder whether > continuations might be expensive in Factor. > > Consider the following 2 implementations of an iterator over a sequence: > > First, using a tuple class > > TUPLE: seqiter seq pos ; > M: sequence iter 0 seqiter boa ; > M: seqiter next [ [ seq>> ] [ pos>> ] bi nth ] [ [ 1 + ] > change-pos drop ] bi > > Second, using coroutines > > : iter ( seq -- iter ) '[ _ [ coyield* ] map ] cocreate ; > : next ( iter -- elt ) *coresume ; > > The first is clearly specific to sequences, and while it's not > complex, having to manage the position variable manually is mildly > fiddly. On the other hand, it's pretty easy to see how the coroutine > version works if you know how map works, and generalising to any > function that spits out values one after another is pretty obvious. > > So I guess the question is, are continuations/coroutines costly enough > to make them a bad choice for something like this? I know this is > premature optimisation, but I'm curious nevertheless... > > Any other comments on this idea would be gratefully accepted, as well. > > Paul. > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Factor-talk mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/factor-talk > ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Factor-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/factor-talk
