Mark Lodato wrote: > On Sat, Oct 3, 2009 at 4:46 AM, Dag Sverre Seljebotn > <[email protected]> wrote: >> In the interest of fixing >> >> http://trac.cython.org/cython_trac/ticket/281 >> >> I'm doing some benchmarks which are rather surprising to me. >> Explanations? Unless something is wrong it seems I can safely replace >> __pyx_lineno and friends with thread variables (on Python versions which >> support it, haven't checked that yet). >> >> In [12]: %timeit dagss.globalvar(1e3) >> 100 loops, best of 3: 6.18 ms per loop >> >> In [13]: %timeit dagss.threadvar(1e3) >> 10000 loops, best of 3: 553 micros per loop >> >> In [14]: %timeit dagss.threadvar(1e6) >> 10 loops, best of 3: 153 ms per loop > > > I don't get the same result at all: > > $ gcc -O0 -shared -I/usr/include/python2.6 -g -fPIC dagss.c -o dagss.so > > In [2]: %timeit dagss.globalvar(1e3) > 100000 loops, best of 3: 10.6 µs per loop > > In [3]: %timeit dagss.threadvar(1e3) > 10000 loops, best of 3: 446 µs per loop > > In [4]: %timeit dagss.globalvar(1e6) > 100 loops, best of 3: 10.3 ms per loop > > In [5]: %timeit dagss.threadvar(1e6) > 10 loops, best of 3: 147 ms per loop > > $ gcc -O1 -shared -I/usr/include/python2.6 -g -fPIC dagss.c -o dagss.so > > In [2]: %timeit dagss.globalvar(1e3) > 1000000 loops, best of 3: 1.67 µs per loop > > In [3]: %timeit dagss.threadvar(1e3) > 10000 loops, best of 3: 451 µs per loop > > In [4]: %timeit dagss.globalvar(1e6) > 1000 loops, best of 3: 1.41 ms per loop > > In [5]: %timeit dagss.threadvar(1e6) > 10 loops, best of 3: 145 ms per loop > > * -O2 or -O3: > > With these optimization levels, the compiler optimizes out the loop. > > * Setup: > > The only change I made was to change i from an 'int' to a 'long' to > remove a compiler warning. > > Cython: version 0.11.3 > gcc: version 4.3.3 > OS: Ubuntu 9.04 > Processor: Intel Core 2 Duo E6300 > Compile line: gcc -shared -I/usr/include/python2.6 -g -fPIC -O0 > dagss.c -o dagss.so
Now I feel stupid ;-) I used runtests.py to compile, which turned on the refnanny, which meant the getter/setter functions had a much larger overhead than the function calls to the CPython lib. Thanks for doing this, I'll just trust your numbers. IMO this is not too horrible, and I think thread local variables are cheap enough that we can safely use them to propagate exceptions -- *if* we for some reason don't want __pyx_lineno and __pyx_clineno as function-local variables. (Which we shouldn't IMO if there's any chance at all that it can decrease normal running performance.) -- Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
