Dag Sverre Seljebotn wrote:
> Gabriel Gellner wrote:
>> So I am giving a talk to my lab about doing some fast ODE solving using
>> python. Traditionally I have used f2py to define the callback function,
>> but I
>> think Cython is a better fit for some of the newer students that don't
>> know
>> fortran (though in this case it is easy to teach them).
>>
>> Now using the cython/numpy tutorial I can get my code to run around 50%
>> slower
>> than the f2py generated code (which is still a healthy order of
>> magnitude
>> faster than the python callback . . .). If  there are any easy things I
>> can do
>> to make the code faster (without sacrificing readability) I would be
>> very
>> grateful!
>>
>> The callback code is (if you want the full program just ask, I am asking
>> more
>> for glaring errors, as opposed to subtle optimizations):
>>
>> cdef class Model:
>>
>>     cdef public double a1, a2, b1, b2, d1, d2
>>
>>         def __call__(self, np.ndarray[np.float_t, ndim=1] y, int t):
>>             cdef np.ndarray[np.float_t, ndim=1] yprime = np.empty(3)
>>
>>             yprime[0] = y[0]*(1.0 - y[0]) - self.a1*y[0]*y[1]/(1.0 +
>> self.b1*y[0])
>>             yprime[1] = self.a1*y[0]*y[1]/(1.0 + self.b1*y[0]) -
>> self.a2*y[1]*y[2]/(1.0 + self.b2*y[1]) - self.d1*y[1]
>>             yprime[2] = self.a2*y[1]*y[2]/(1.0 + self.b2*y[1]) -
>> self.d2*y[2]
>>
>>             return yprime
>>
>
> The amount of work that is done in this function is almost nothing -- i.e.
> "n" is hard-coded to 3. So I think you'll find that the thing killing
> performance here is calling the function and passing the arguments.
>
> For starters, use typed polymorphism: Make the function "cpdef" and give
> it another name, have a parent class "AbstractModel" with the same
> function in it, and in the calling code type the callee as AbstractModel.
>
> After that it would help to pass around raw float* rather than NumPy
> objects in this case when n is so small (unfortunately, there's no way to
> pass around an acquired buffer between functions. I have ideas of course,
> but they are not implemented.)
>
> Without knowing the nature of the caller (where the real bottlenetck
> likely is) it is difficult to give better advice.

However, I'll also note that the usual way to get high performance with
NumPy is to reformulate the computation so that you do it all in
parallell. If you are evaluating this using a grid with n=1000 or similar,
then as long as you code things so that all operations happen "in
parallell" then you could code it in pure Python if you wished and it
would still be at least comparable to Fortran.

I.e. to avoid all the call overheads etc., it pays off to rather do:

gridvalues = eval_func(gridpts)

than

for i in range(..):
    gridvalues[i] = eval_func(gridpts[i])

and continue in the same fashion all the way down. Wherever your loop is,
there's your problem...

Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to