Gabriel Gellner wrote:
> On Tue, Nov 25, 2008 at 07:21:27PM +0100, Dag Sverre Seljebotn wrote:
>> Gabriel Gellner wrote:
>>> But the only difference in the caller between the f2py version and this one
>>> is this file. So the speed difference must be from the function calls etc as
>>> you have said, but I don't understand how the overhead can be from the 
>>> caller
>>> as the code for this is identical (and not compiled).
>>>
>>> the code is basically:
>>>
>>> for model.b1 in np.linspace(0, 2.6, 100):
>>>     odeint(model, y0, t0)
>>>
>>> where model is defined in a separate file, in one case as a fortran
>>> object generated by f2py and in this case by cython.
>> Ah, right.
>>
>> It's interesting (but not that surprising) to see that f2py does perform  
>> better in this area, due to both the call to np.empty and that acquiring  
>> the buffer from the NumPy array likely is slower than the NumPy-specific  
>> stuff that f2py is doing.
>>
>> But that Python snippet is likely going to use the majority of the time  
>> no matter how things are done, so it doesn't matter much. I'm interested  
>> to hear how much do you gain over pure Python in this case then --  
>> probably not too much, perhaps 2-300%? (Moving the entire for-loop to  
>> Cython would typically give you around 800% improvement).
>>
> The Cython callback version is 10 times faster than the pure python.

Sorry about the %-numbers above, I have no idea how the confused 
sentence above got there :-)

What I meant to say: For the simplest extreme examples, putting the loop 
Cython-side gives about 1000 times improvement. Your example has a lot 
more stuff inside the loop, but I still have a feeling that improvements 
in the range of 200 times are easily within range if the entire solver 
is written in Cython.

However, you definitely raise some interesting points below. While a 
mere 10 times increase is not close to what I wrote e.g. the buffer 
stuff for, then of course, when you sit there, wondering whether to 
bother with Cython or use MATLAB and Fortran, 10 times faster is, well, 
10 times faster.

>> Anyway, this is not likely to be an area where Cython is improved, so  
>> there's nothing to do about it. I don't think it is a common usecase  
>> either -- typically either one keep everything in Python, or one decide  
>> that Python only is too slow, but then one doesn't want to only leverage  
>> a small part of the speed increase that compiled code can give you...
>>
> I don't think this is true. Speeding up callbacks (that I loop in my code is
> not essential odeint needs to call the provided function a lot, even without a
> python loop) is very, very common. PyDSTool (a dynamical systems package) has
> even invented their own text format to do this. Unless I can expect that every
> python routine I use has a Cython interface that excepts cdef-like functions
> (which for scipy [as far as I know] is currently not true at all) than I am
> stuck with python callbacks or making my own wrappers (and not using the vast
> scipy ecosystem), which would mean I should just use C or Fortran alone. Using
> tools like f2py, weave, and recently cython is the only major draw to python I
> have to offer fellow researchers over matlab (as it makes the equivalent MEX
> construction seem extra painful). Heck using this I can significantly speed up
> solvers written in pure python that consume a callback. I find for most
> iterative solvers in python the callback is the most expensive part (even for
> toy examples like I have given), especially with Cython to get rid of the loop
> overhead.

Very interesting perspective. Unfortunately I do not have a many good 
ideas about what to do about it. (In some ways I am taking the 
perspective that SciPy may well have a transparent Cython interface for 
callbacks in some time, so that callbacks passed in that subclass from a 
special SciPy parent class would be called in a fast way from SciPy 
code. I know that SciPy already use Cython for some things.)

But one thing I forgot to mention is trying to use the "cast" flag on 
the buffer. I.e. write

np.ndarray[float, cast=True] arr

this will skip some checking of the dtype (and mess up the buffer rather 
than cast an exception if the array passed has the wrong dtype...).

Also, if you are really interested in this then have a look at the C 
source generated by Cython -- there's a lot of stuff going on in order 
to acquire access to the ndarray, and if you find that disabling a line 
which checks things spends a lot of time we could easily add a flag to 
disable the check.

Also I did once have plans for optimizing "np.empty" etc. so that no 
Python overhead would be necesarry (through inlineable functions in 
numpy.pxd), but I didn't have time for doing it.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to