On Nov 25, 2008, at 7:45 AM, Dag Sverre Seljebotn wrote:
> Gabriel Gellner wrote:
>> So I am giving a talk to my lab about doing some fast ODE solving
>> using
>> python. Traditionally I have used f2py to define the callback
>> function,
>> but I
>> think Cython is a better fit for some of the newer students that
>> don't
>> know
>> fortran (though in this case it is easy to teach them).
>>
>> Now using the cython/numpy tutorial I can get my code to run
>> around 50%
>> slower
>> than the f2py generated code (which is still a healthy order of
>> magnitude
>> faster than the python callback . . .). If there are any easy
>> things I
>> can do
>> to make the code faster (without sacrificing readability) I would
>> be very
>> grateful!
>>
>> The callback code is (if you want the full program just ask, I am
>> asking
>> more
>> for glaring errors, as opposed to subtle optimizations):
>>
>> cdef class Model:
>>
>> cdef public double a1, a2, b1, b2, d1, d2
>>
>> def __call__(self, np.ndarray[np.float_t, ndim=1] y, int t):
>> cdef np.ndarray[np.float_t, ndim=1] yprime = np.empty(3)
>>
>> yprime[0] = y[0]*(1.0 - y[0]) - self.a1*y[0]*y[1]/(1.0 +
>> self.b1*y[0])
>> yprime[1] = self.a1*y[0]*y[1]/(1.0 + self.b1*y[0]) -
>> self.a2*y[1]*y[2]/(1.0 + self.b2*y[1]) - self.d1*y[1]
>> yprime[2] = self.a2*y[1]*y[2]/(1.0 + self.b2*y[1]) -
>> self.d2*y[2]
>>
>> return yprime
>>
>
> The amount of work that is done in this function is almost nothing
> -- i.e.
> "n" is hard-coded to 3. So I think you'll find that the thing killing
> performance here is calling the function and passing the arguments.
>
> For starters, use typed polymorphism: Make the function "cpdef" and
> give
> it another name, have a parent class "AbstractModel" with the same
> function in it, and in the calling code type the callee as
> AbstractModel.
>
> After that it would help to pass around raw float* rather than NumPy
> objects in this case when n is so small (unfortunately, there's no
> way to
> pass around an acquired buffer between functions. I have ideas of
> course,
> but they are not implemented.)
One way to do it would be to flatten/compact the ndarray early on (so
one know the entries are contiguous) and then pass the raw float* and
its length (keeping the original array around so you don't have to
worry about memory issues.
Note that __call__, though not as slow as a normal python function
call, still has Python semantics (i.e. all of its arguments and its
return value have to pass through Python objects) so just using a
cdef method could speed things up considerably. (I wonder how hard it
would be to support cpdef __call__?) The call to empty is probably
dominating things too--since the 3 seems hard coded in anyways, I
would accept (and return) a
cdef struct data:
float x
float y
float z
instead of a 3-element ndarray.
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev