On Aug 23, 2008, at 9:00 AM, Stefan Behnel wrote: > Hi, > > I found some time to do a pretty complete rewrite of the argument > unpacking > code, i.e. the code that unpacks positional args and keyword > arguments from > the Python calling convention into the local variables of a function. > Previously, we were using PyArg_ParseTupleAndKeywords with a couple > of added > optimisations for common cases. I replaced this with a dedicated > implementation that is generated specifically for the signature of the > function. It uses extremely well optimisable switch statements in > the C code > (and no loops unless you use *args/**kwargs or in some of the error > cases). > > The result is that Cython now generates function init code that is > faster than > Python in most cases, even in many cases where you pass keyword > arguments > (which wasn't the case before at all). It's actually a lot faster > than the > last Cython release, especially for keyword arguments, and quite a > bit faster > than Python, which has impressively fast generic argument handling > code. I > attached some numbers below.
Very neat. > > The only case where we are still a bit slower than Python is when > you pass > (almost) all arguments as keyword arguments. I suspect that it > would be faster > in this case to iterate over the dictionary and compare the keys > with the > argument names than to ask for each key separately. Which strategy > to choose > could be decided at runtime if the size of the keyword dictionary > is close to > the total number of arguments. However, as the absolute numbers > compared to > Python show, this may not be worth the effort. Maybe. It wouldn't be a huge savings nor a common case, and there is the issue of bloating the header of the function, so I think what you have is great. > > Have fun, > > Stefan > > > ----------------------------- > def generic(*args, **kwargs): > result = (args, kwargs) > > def partially_generic(a, b, *args, **kwargs): > result = (a, b, args, kwargs) > > def args(*args): > result = (args,) > > def positional(a,b,c,d): > result = (a,b,c,d) > > def optional(a=1, b=2, c=3, d=4): > result = (a,b,c,d) > ----------------------------- > > $ TEST=optional; python2.6 -m timeit "test(1,2)" > callbench_python > 1000000 loops, best of 3: 1.29 usec per loop > callbench_devel > 1000000 loops, best of 3: 0.603 usec per loop > callbench_release > 1000000 loops, best of 3: 0.61 usec per loop > > $ TEST=args; python2.6 -m timeit "test(1,2)" > callbench_python > 1000000 loops, best of 3: 1.09 usec per loop > callbench_devel > 1000000 loops, best of 3: 0.547 usec per loop > callbench_release > 1000000 loops, best of 3: 0.568 usec per loop > > $ TEST=optional; python2.6 -m timeit "test(1,2,3,d=4)" > callbench_python > 1000000 loops, best of 3: 1.5 usec per loop > callbench_devel > 1000000 loops, best of 3: 1.25 usec per loop > callbench_release > 1000000 loops, best of 3: 1.99 usec per loop > > $ TEST=positional; python2.6 -m timeit "test(1,2,3,d=4)" > callbench_python > 1000000 loops, best of 3: 1.44 usec per loop > callbench_devel > 1000000 loops, best of 3: 1.23 usec per loop > callbench_release > 100000 loops, best of 3: 2.01 usec per loop > > $ TEST=positional; python2.6 -m timeit "test(a=1,b=2,c=3,d=4)" > callbench_python > 1000000 loops, best of 3: 1.73 usec per loop > callbench_devel > 1000000 loops, best of 3: 1.81 usec per loop > callbench_release > 100000 loops, best of 3: 2.41 usec per loop > > $ TEST=generic; python2.6 -m timeit "test(a=1,b=2,c=3,d=4)" > callbench_python > 100000 loops, best of 3: 2.15 usec per loop > callbench_devel > 100000 loops, best of 3: 2.48 usec per loop > callbench_release > 100000 loops, best of 3: 2.56 usec per loop > > $ TEST=partially_generic; python2.6 -m timeit "test(a=1,b=2,c=3,d=4)" > callbench_python > 100000 loops, best of 3: 2.51 usec per loop > callbench_devel > 100000 loops, best of 3: 2.96 usec per loop > callbench_release > 100000 loops, best of 3: 3.42 usec per loop > > $ TEST=partially_generic; python2.6 -m timeit "test(1,b=2,c=3,d=4)" > callbench_python > 100000 loops, best of 3: 2.25 usec per loop > callbench_devel > 100000 loops, best of 3: 2.52 usec per loop > callbench_release > 100000 loops, best of 3: 3.1 usec per loop > > $ TEST=partially_generic; python2.6 -m timeit "test(1,2,3,d=4)" > callbench_python > 100000 loops, best of 3: 2.01 usec per loop > callbench_devel > 100000 loops, best of 3: 2.06 usec per loop > callbench_release > 100000 loops, best of 3: 2.46 usec per loop > _______________________________________________ > Cython-dev mailing list > [email protected] > http://codespeak.net/mailman/listinfo/cython-dev _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
