Re: [Factor-talk] Playing with python-like iterators (and efficiency of continuations)

Daniel Ehrenberg Sat, 31 Oct 2009 15:37:02 -0700

1. Exceptions would be a slow (in the current implementation) and
messy way to signal the end of an iteration. Exceptions should be used
for things that aren't likely to happen, and this is something likely.
You could base your design for next on the assoc protocol. at* looks
up the value corresponding to a key in an assoc. If the key is
present, it returns the value along with the flag t. If the key is
absent, it returns f for the value and f for the flag. Often, code
will call the word 'at' rather than 'at*', which is : at at* drop
;--just returning the value, or f if the value isn't there. But this
only works if you expect that at* shouldn't return f for the value, or
if you want to equate a value of f with a missing key, or you know the
key will be there, etc. So you could have next* and next.

Alternatively, you could have a function empty? which tests if the
iterator has been exhausted, together with a next-unsafe function that
increments the iterator and is allowed to create an invalid iterator
if it's already done. Out of these primitives, you could build next
(or next*) to safely increment the iterator, either throwing an
exception if it's done (if we don't think anyone's likely to call next
on purpose on an exhausted iterator) or return a flag about whether
it's done (if we think people are likely to call it on purpose in that
case).

If you go with a generic protocol for iterators (which I would
recommend for performance reasons; I wrote about this below), then you
should choose the base functions by what's easiest to implement, and
then expose a set of functions on top of that which turn out to be the
most useful. Probably, the situation where you call 'next' directly at
all is rare, compared to situations where you want to call 'each'.

2. Using coroutines for this is very costly and probably will always
be more expensive than writing code in a direct style. Saving and
restoring contexts has some overhead, and it's harder for the compiler
to reason about it. How would you be able to inline the computation,
even if the coroutine is known? There's probably a way, but I don't
know of any compiler that does this. It must have a cost in Python
too, but maybe the penalty isn't so great because your Python code
will already be slow for other reasons.

In general:
I like your idea of making a set of generic words for iterators. There
could be several classes that implement the iterator words: sequences,
coroutine-based generators, linked lists, deques, hashtables, etc.
This would be a bit of code to implement each of them, a little more
than if you went with coroutines because you need an explicit
representation of the cursor. But if a system like that existed, you
could write code that iterates over any type of object. You could
implement 'each', for example, in terms of iterators, and if 'like'
were part of the protocol, you could implement 'map'.

It would be great to replace the sequence protocol with this. Then,
many of the sequence words wouldn't be tied to operating on
random-access structures. Code duplication in the standard library
could be reduced.

The big challenge is doing this without performance regressions. The
code should be expressed in a way that the compiler can optimize out
the overhead of using iterator objects rather that doing the iteration
directly. This is impossible for coroutines (at least with the
compiler we have right now), but it could be possible for iterators
over sequences, if enough type information is known by the compiler.

Maybe you want to make these iterators functional. That is, maybe the
next operation should return a new iterator that stands for the new
location, without modifying the old iterator object. One advantage of
this is that the compiler understands how to do escape analysis on
immutable tuples. This is the only way I could see iterators being
implemented that wouldn't be a performance regression for sequence
operations. A disadvantage is that, if escape analysis can't be done,
it allocates more memory.

Dan

On Sat, Oct 31, 2009 at 4:46 PM, Paul Moore <[email protected]> wrote:
> I'm playing with the idea of iterators and generators from Python.
> Basically, an iterator is an object that supports a single operation,
> next, which produces the iterator's next value. Iterators can be
> created from sequences in the obvious way by iterating through the
> items.
>
> Python also supports generators, which are a limited form of coroutine:
>
> def gen():
>    yield 1
>    yield 2
>    yield 3
>
> defines a generator function which, when called, returns an iterator
> which produces the values from each yield statement in turn.
>
> There are two questions which come to mind while trying to design these:
>
> 1. What's the most idiomatic way in Factor to implement next? In
> Python, it returns a new value each time it's called, and signals that
> it has run out of values by raising a special StopIteration exception.
> This has the major advantage that iterators don't have to generate a
> value until it's needed - there's no empty? function which needs to
> know whether next will work the next time you call it - just call it
> and catch the exception. Is using an exception for control flow like
> this a reasonable option in Factor? An alternative would be to return
> an extra flag - but I suspect that will make stack handling a lot more
> messy.
>
> 2. The obvious way of implementing generators in Factor would be on
> top of continuations/coroutines. But one of the reasons Python's
> generators are limited is that they can be efficient (full coroutines
> are a lot more costly than generators). Which makes me wonder whether
> continuations might be expensive in Factor.
>
> Consider the following 2 implementations of an iterator over a sequence:
>
> First, using a tuple class
>
>    TUPLE: seqiter seq pos ;
>    M: sequence iter 0 seqiter boa ;
>    M: seqiter next [ [ seq>> ] [ pos>> ] bi nth ] [ [ 1 + ]
> change-pos drop ] bi
>
> Second, using coroutines
>
>    : iter ( seq -- iter ) '[ _ [ coyield* ] map ] cocreate ;
>    : next ( iter -- elt ) *coresume ;
>
> The first is clearly specific to sequences, and while it's not
> complex, having to manage the position variable manually is mildly
> fiddly. On the other hand, it's pretty easy to see how the coroutine
> version works if you know how map works, and generalising to any
> function that spits out values one after another is pretty obvious.
>
> So I guess the question is, are continuations/coroutines costly enough
> to make them a bad choice for something like this? I know this is
> premature optimisation, but I'm curious nevertheless...
>
> Any other comments on this idea would be gratefully accepted, as well.
>
> Paul.
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> Factor-talk mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] Playing with python-like iterators (and efficiency of continuations)

Reply via email to