Vitja Makarov, 26.12.2011 20:07:
2011/12/25 Stefan Behnel:
Stefan Behnel, 21.12.2011 21:17:
Vitja Makarov, 21.12.2011 19:48:
Some time ago we were talking about generators optimization by copying
local variables from closure into local scope.
Yes, I think that will make it easier for the C compiler to make
optimistic assumptions about external values.
Now I think that should be a good idea to implement this for both
generators and regular closure functions. So local var will be used
for reference and assignment should be made to both copies. Of course
there are some variables that shouldn't be copied: non-local vars,
arrays, C++ classes and structures.
Basically, anything that external code can modify. That makes it a bit
tricky to do it also for 'normal' closure functions - the whole idea is
that there is more than one function that can refer to a variable.
Also it may be a good idea to move outer scope pointer into local
variable.
So I'm wondering what is a good test to measure actual speedup?
http://blog.behnel.de/index.php?p=163
Just take the plain Python versions of the iterparse functions and compare
them before and after the change. The raw C implementation in CPython
gives a good baseline.
Actually, it would be generally interesting to run the Cython versions
through callgrind to see where the time is actually being spent.
Another useful test is the "nqueens" benchmark from the CPython test suite.
It's regularly run on Jenkins in Py2.7 and 3.3.
https://sage.math.washington.edu:8091/hudson/view/bench/
Note that it mostly uses generator expressions, which could easily benefit
from a couple of further optimisations by specialising them, e.g. by
providing a length hint.
http://trac.cython.org/cython_trac/ticket/756
I have implemented local variable copying, it could be found here:
https://github.com/vitek/cython/tree/_copy_closure
Cool.
I didn't noticed significant speedup running nqueens test. Indeed I'm
not sure it's speedup.
It's all about<+-2%. Perhaps some better test is required.
Well, yes, I'd guess that generator expressions are really not the target
of this optimisation. They simply don't do enough in between two iterations.
Btw, I got ~8% speedup for this dummy test:
def foo():
cdef int i, r
cdef list o
o = []
def bar():
return len(o)
for i in range(10000000):
bar()
r += len(o)
return r
Any speedup will help someone, I guess. And it's good to know that it makes
a difference if the closure code isn't plain trivial.
What's iterparse()?
Ah, sorry, my bad. Too much lxml stuff lately, I guess. I really meant
"itertools". See the blog link above.
Also, consider using callgrind for performance analysis. For one, it'll
print the number of executed instructions after a test run. For Cython
generated ode, it's usually a good sign if that decreases after a change.
For a deeper analysis of such a profiling run, you can then use kcachegrind.
Stefan
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel