On Nov 25, 2008, at 7:45 AM, Dag Sverre Seljebotn wrote:

> Gabriel Gellner wrote:
>> So I am giving a talk to my lab about doing some fast ODE solving  
>> using
>> python. Traditionally I have used f2py to define the callback  
>> function,
>> but I
>> think Cython is a better fit for some of the newer students that  
>> don't
>> know
>> fortran (though in this case it is easy to teach them).
>>
>> Now using the cython/numpy tutorial I can get my code to run  
>> around 50%
>> slower
>> than the f2py generated code (which is still a healthy order of  
>> magnitude
>> faster than the python callback . . .). If  there are any easy  
>> things I
>> can do
>> to make the code faster (without sacrificing readability) I would  
>> be very
>> grateful!
>>
>> The callback code is (if you want the full program just ask, I am  
>> asking
>> more
>> for glaring errors, as opposed to subtle optimizations):
>>
>> cdef class Model:
>>
>>     cdef public double a1, a2, b1, b2, d1, d2
>>
>>         def __call__(self, np.ndarray[np.float_t, ndim=1] y, int t):
>>             cdef np.ndarray[np.float_t, ndim=1] yprime = np.empty(3)
>>
>>             yprime[0] = y[0]*(1.0 - y[0]) - self.a1*y[0]*y[1]/(1.0 +
>> self.b1*y[0])
>>             yprime[1] = self.a1*y[0]*y[1]/(1.0 + self.b1*y[0]) -
>> self.a2*y[1]*y[2]/(1.0 + self.b2*y[1]) - self.d1*y[1]
>>             yprime[2] = self.a2*y[1]*y[2]/(1.0 + self.b2*y[1]) -
>> self.d2*y[2]
>>
>>             return yprime
>>
>
> The amount of work that is done in this function is almost nothing  
> -- i.e.
> "n" is hard-coded to 3. So I think you'll find that the thing killing
> performance here is calling the function and passing the arguments.
>
> For starters, use typed polymorphism: Make the function "cpdef" and  
> give
> it another name, have a parent class "AbstractModel" with the same
> function in it, and in the calling code type the callee as  
> AbstractModel.
>
> After that it would help to pass around raw float* rather than NumPy
> objects in this case when n is so small (unfortunately, there's no  
> way to
> pass around an acquired buffer between functions. I have ideas of  
> course,
> but they are not implemented.)


One way to do it would be to flatten/compact the ndarray early on (so  
one know the entries are contiguous) and then pass the raw float* and  
its length (keeping the original array around so you don't have to  
worry about memory issues.

Note that __call__, though not as slow as a normal python function  
call, still has Python semantics (i.e. all of its arguments and its  
return value have to pass through Python objects) so just using a  
cdef method could speed things up considerably. (I wonder how hard it  
would be to support cpdef __call__?) The call to empty is probably  
dominating things too--since the 3 seems hard coded in anyways, I  
would accept (and return) a

cdef struct data:
        float x
        float y
        float z

instead of a 3-element ndarray.

- Robert


_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to