Antoine Pitrou, 22.11.2009 18:06:
> Stefan Behnel <stefan...@...> writes:
>> Just to be sure, not only the computed goto itself is a non-standard
>> feature, but also taking the address of a label using "&&", right?
> 
> Indeed. I'm not sure why you care about computed gotos here. Most generators
> only have a single yield statement, so you'll have a single-case switch
> statement (or two cases if you take into account the case where the generator 
> is
> being entered rather than resumed), so the code will be quite simple anyway 
> and
> the CPU's branch predictor should do a good job.

Agreed. However, the common single-yield case can always be optimised into

    if (likely(closure._resume_label == ...))
       goto __L1_resume_from_yield;

('likely' being a branch prediction hint here), in which case it wouldn't
matter if we use an int or a pointer. One test is needed anyway, as the
label is initially 0. As a result, you'd just get an asymptotic 100% branch
prediction with a single initial branch.

The main advantage of computed gotos for the multiple-yield cases is that
the code generation itself becomes simpler, as the generator function
wouldn't have to know the yield labels (which are created at code
generation time and only at need, so only the yield node itself knows if it
gets generated). But given that they are not portable, and assuming that
there are cases where knowing the labels is simply faster (such as the
single-yield case above), we will need to propagate that information
anyway, so we can just as well drop the computed gotos completely. But you
got me thinking about the branch prediction.

Disclaimer: I'm actually aware that this deals with micro-optimisations
that might not turn into any performance gain at all. I just found it
interesting to look at what kind of code we can expect to see.

For the multiple yield case, I think a scenario like the following would be
quite common:

    def gen():
        while 1:
            # calculate some correlated values x,y,z
            yield x
            yield y
            yield z

In that case, there will be zero branch prediction, as the code will jump
to a different label on each call, despite them being close in the code and
doing exactly the same thing.

Looking at

http://docs.python.org/library/itertools.html

another common scenario would be this:

    def gen(flag):
        if flag:
            while 1:
                yield ...
        else:
            while 1:
                yield ...

in which case the branch prediction would actually work pretty well, but
would depend on how the switch code is arranged. Any ideas how computed
gotos would perform here? They'd always jump to the same address, after
all, but I have no idea if the CPU learns from that...

A third scenario uses an initial setup run (also from itertools):

    def gen():
        for x in something:
            yield f(x)
        while 1:
            for x in something_else:
                 yield f(x)

Here, the branch prediction would only fail in two cases during the life
cycle (at startup and when switching to the while loop), and the final
performance would depend on the switch statement again.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to