On Aug 23, 2008, at 9:00 AM, Stefan Behnel wrote:

> Hi,
>
> I found some time to do a pretty complete rewrite of the argument  
> unpacking
> code, i.e. the code that unpacks positional args and keyword  
> arguments from
> the Python calling convention into the local variables of a function.
> Previously, we were using PyArg_ParseTupleAndKeywords with a couple  
> of added
> optimisations for common cases. I replaced this with a dedicated
> implementation that is generated specifically for the signature of the
> function. It uses extremely well optimisable switch statements in  
> the C code
> (and no loops unless you use *args/**kwargs or in some of the error  
> cases).
>
> The result is that Cython now generates function init code that is  
> faster than
> Python in most cases, even in many cases where you pass keyword  
> arguments
> (which wasn't the case before at all). It's actually a lot faster  
> than the
> last Cython release, especially for keyword arguments, and quite a  
> bit faster
> than Python, which has impressively fast generic argument handling  
> code. I
> attached some numbers below.

Very neat.

>
> The only case where we are still a bit slower than Python is when  
> you pass
> (almost) all arguments as keyword arguments. I suspect that it  
> would be faster
> in this case to iterate over the dictionary and compare the keys  
> with the
> argument names than to ask for each key separately. Which strategy  
> to choose
> could be decided at runtime if the size of the keyword dictionary  
> is close to
> the total number of arguments. However, as the absolute numbers  
> compared to
> Python show, this may not be worth the effort.

Maybe. It wouldn't be a huge savings nor a common case, and there is  
the issue of bloating the header of the function, so I think what you  
have is great.

>
> Have fun,
>
> Stefan
>
>
> -----------------------------
> def generic(*args, **kwargs):
>     result = (args, kwargs)
>
> def partially_generic(a, b, *args, **kwargs):
>     result = (a, b, args, kwargs)
>
> def args(*args):
>     result = (args,)
>
> def positional(a,b,c,d):
>     result = (a,b,c,d)
>
> def optional(a=1, b=2, c=3, d=4):
>     result = (a,b,c,d)
> -----------------------------
>
> $ TEST=optional; python2.6 -m timeit "test(1,2)"
> callbench_python
> 1000000 loops, best of 3: 1.29 usec per loop
> callbench_devel
> 1000000 loops, best of 3: 0.603 usec per loop
> callbench_release
> 1000000 loops, best of 3: 0.61 usec per loop
>
> $ TEST=args; python2.6 -m timeit "test(1,2)"
> callbench_python
> 1000000 loops, best of 3: 1.09 usec per loop
> callbench_devel
> 1000000 loops, best of 3: 0.547 usec per loop
> callbench_release
> 1000000 loops, best of 3: 0.568 usec per loop
>
> $ TEST=optional; python2.6 -m timeit "test(1,2,3,d=4)"
> callbench_python
> 1000000 loops, best of 3: 1.5 usec per loop
> callbench_devel
> 1000000 loops, best of 3: 1.25 usec per loop
> callbench_release
> 1000000 loops, best of 3: 1.99 usec per loop
>
> $ TEST=positional; python2.6 -m timeit "test(1,2,3,d=4)"
> callbench_python
> 1000000 loops, best of 3: 1.44 usec per loop
> callbench_devel
> 1000000 loops, best of 3: 1.23 usec per loop
> callbench_release
> 100000 loops, best of 3: 2.01 usec per loop
>
> $ TEST=positional; python2.6 -m timeit "test(a=1,b=2,c=3,d=4)"
> callbench_python
> 1000000 loops, best of 3: 1.73 usec per loop
> callbench_devel
> 1000000 loops, best of 3: 1.81 usec per loop
> callbench_release
> 100000 loops, best of 3: 2.41 usec per loop
>
> $ TEST=generic; python2.6 -m timeit "test(a=1,b=2,c=3,d=4)"
> callbench_python
> 100000 loops, best of 3: 2.15 usec per loop
> callbench_devel
> 100000 loops, best of 3: 2.48 usec per loop
> callbench_release
> 100000 loops, best of 3: 2.56 usec per loop
>
> $ TEST=partially_generic; python2.6 -m timeit "test(a=1,b=2,c=3,d=4)"
> callbench_python
> 100000 loops, best of 3: 2.51 usec per loop
> callbench_devel
> 100000 loops, best of 3: 2.96 usec per loop
> callbench_release
> 100000 loops, best of 3: 3.42 usec per loop
>
> $ TEST=partially_generic; python2.6 -m timeit "test(1,b=2,c=3,d=4)"
> callbench_python
> 100000 loops, best of 3: 2.25 usec per loop
> callbench_devel
> 100000 loops, best of 3: 2.52 usec per loop
> callbench_release
> 100000 loops, best of 3: 3.1 usec per loop
>
> $ TEST=partially_generic; python2.6 -m timeit "test(1,2,3,d=4)"
> callbench_python
> 100000 loops, best of 3: 2.01 usec per loop
> callbench_devel
> 100000 loops, best of 3: 2.06 usec per loop
> callbench_release
> 100000 loops, best of 3: 2.46 usec per loop
> _______________________________________________
> Cython-dev mailing list
> [email protected]
> http://codespeak.net/mailman/listinfo/cython-dev

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to