>> Adam Ginsburg wrote:
>>> Much appreciated.  I guess the various levels of yellow in the html
>>> file indicate the slow lines?  I tried getting rid of all of my numpy
>>> calls in the loop by rewriting them as loops, but that hasn't improved
>>> speed at all, and in fact appears to have become slower.  Right now a
>>> fortran (f2py) version goes ~75% faster and pure python goes ~25%
>>> faster, so I must be doing something wrong.
>>
>> Not necesarrily.
>>
>> If your Fortran code is only 2x faster than Python there's usually not
>> much Cython can do. Cython is for the times when Fortran is 1000x-2000x
>> faster than Python!

That makes sense.  I was very surprised to find that the fortran code
was only a few times faster than the python code; I suppose the
problem I'm trying to address must be intrinsically slow.  I have lots
of full-array operations, but they are being done within long for
loops (arrays of size ~n done ~n times...).

> (What's the size of your test data though? If f2py overhead comes into
> play then Fortran could really be faster, meaning more of a potential for
> Cython. I didn't really look at your code though.)

My test data set is 10^4 elements, which is typical for what I expect
to deal with but it could go up an order of magnitude.  Of course, I
need to do 10^4 element sets ~10^4 times each...


> I'd add a mode='c' option to all the cnp.ndarray's -- this will speed up 
> access.
>
> A future optimization would be the @cython.boundscheck(False) directive.

Do you mean future as in "don't try to use it now" or "should use it
if it's safe to proceed without boundary checking"?  Also, this looks
like a decorator to me, but I couldn't compile when I put it in front
of my function definition.

> In the inner loops get rid of *all* python operations.  For example:
>
> line 63: z    = z[z>=xmin]
>    this is heavy on numpy operations (allocates & discards a temp
> boolean array every time, etc) and might kill performance.
>
> line 64:  n = float(...)  ==> n = <float>(...)  # replace a Python
> cast with a C-level cast

I think I came up with a way around these both.  <float> didn't work,
though - I received errors when I tried it.

> line 80:  cf   = 1-(xmin/z)**a
>    This is again heavy on Python -- you might do better with a loop
> over z and use pow from math.h.

OK, I put cx and cf into loops and switched from ** to pow.

> Hope this helps (and let us know if it doesn't).

I found a factor of 2-3 improvement.  In particular, I left one python
float() in, and that made python and cython go ~the same speed.  When
I changed it to <float>, it dropped the cython time by 2.

Now execution times are:
n=2e4
python~3x fortran
cython~1.5x fortran

So this may be as fast as I can get.  I'm a little confused about
fortran getting slower relative to python as n gets larger, but that
is probably some sort of failure on my part.

Thanks for the help!
Adam
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to