On Mon, Mar 16, 2009 at 6:28 AM, Stefan Behnel <[email protected]> wrote: > So, yes, there is a performance difference of up to 30% even for the > fastest (BTW, branching) implementation. For a constant power-of-2 divisor > (m=16), the difference is about 17% for me: > > ./cmod > -1 > real 0m3.316s > user 0m3.268s > sys 0m0.000s > ./py2mod > -589934593 > real 0m3.880s > user 0m3.868s > sys 0m0.000s > ./pymod > -589934593 > real 0m4.634s > user 0m4.580s > sys 0m0.000s
But note that for constant power-of-2 divisors, getting Python semantics by using a bitmask is actually faster than getting C semantics: cwi...@magnetar:/tmp$ time ./cmod -1 real 0m1.875s user 0m1.872s sys 0m0.004s cwi...@magnetar:/tmp$ time ./py3mod -589934593 real 0m1.214s user 0m1.212s sys 0m0.000s This is because Python semantics is a single AND instruction, but C semantics uses AND plus some fixups to get the negative result. (So your pymod and py2mod above are using an AND, subtracting to get the negative result, and then adding to undo the subtractions.) Carl _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
