On 05/22/2011 02:33 PM, Stefan Behnel wrote:
Hi,

I've been looking at the nqueens benchmark for a while, and I think it's
actually not that a bad benchmark for generators.

http://hg.python.org/benchmarks/file/tip/performance/bm_nqueens.py

A better implementation only for Py2.7/Py3 is here:

https://github.com/cython/cython/blob/master/Demos/benchmarks/nqueens.py

Cython currently runs the first implementation about as fast as Py3.3:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-pybenchmarks-py3k/lastSuccessfulBuild/artifact/chart.html


and the second one more than 3x as fast:

https://sage.math.washington.edu:8091/hudson/view/bench/job/cython-devel-cybenchmarks-py3k/lastSuccessfulBuild/artifact/chart.html


However, I think there's still some space for improvements, and local
variables are part of that. For generator functions that do non-trivial
things between yields, I think that local variables will quickly become
a bottleneck. Currently, they are always closure fields, so any access
to them will use a pointer indirection to a foreign struct, originally
passed in as an argument to the function. Given that generators often do
Python object manipulation through C-API calls, any such call will
basically require the C compiler to assume that all values in the
closure may have changed, thus disabling any optimisations for them. The
same applies to many other object related operations or pointer
operations (even DECREF!), as the C compiler cannot know that the
generator function owns the closure during its lifetime exclusively.

I think it would be worth changing the current implementation to use
local C variables for local Cython variables in the generator, and to
copy the values back into/from the closure around yields. I'd even let
local Python references start off as NULL when the generator is created,
given that Vitek's branch can eliminate None initialisations now.

Keep in mind that if speed is the objective, another idea is to use real C coroutines. This would likely be faster than anything we can make up ourselves; a single stack jump is bound to be faster than copying things in and out of the stack.

Of course, probably more work. And then portability as the API is seperate for Windows (fibers) and POSIX (makecontext). But I don't think there's a lack of compatability layer libraries which unite the platform-specific APIs.

Dag Sverre
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to