>> Adam Ginsburg wrote: >>> Much appreciated. I guess the various levels of yellow in the html >>> file indicate the slow lines? I tried getting rid of all of my numpy >>> calls in the loop by rewriting them as loops, but that hasn't improved >>> speed at all, and in fact appears to have become slower. Right now a >>> fortran (f2py) version goes ~75% faster and pure python goes ~25% >>> faster, so I must be doing something wrong. >> >> Not necesarrily. >> >> If your Fortran code is only 2x faster than Python there's usually not >> much Cython can do. Cython is for the times when Fortran is 1000x-2000x >> faster than Python!
That makes sense. I was very surprised to find that the fortran code was only a few times faster than the python code; I suppose the problem I'm trying to address must be intrinsically slow. I have lots of full-array operations, but they are being done within long for loops (arrays of size ~n done ~n times...). > (What's the size of your test data though? If f2py overhead comes into > play then Fortran could really be faster, meaning more of a potential for > Cython. I didn't really look at your code though.) My test data set is 10^4 elements, which is typical for what I expect to deal with but it could go up an order of magnitude. Of course, I need to do 10^4 element sets ~10^4 times each... > I'd add a mode='c' option to all the cnp.ndarray's -- this will speed up > access. > > A future optimization would be the @cython.boundscheck(False) directive. Do you mean future as in "don't try to use it now" or "should use it if it's safe to proceed without boundary checking"? Also, this looks like a decorator to me, but I couldn't compile when I put it in front of my function definition. > In the inner loops get rid of *all* python operations. For example: > > line 63: z = z[z>=xmin] > this is heavy on numpy operations (allocates & discards a temp > boolean array every time, etc) and might kill performance. > > line 64: n = float(...) ==> n = <float>(...) # replace a Python > cast with a C-level cast I think I came up with a way around these both. <float> didn't work, though - I received errors when I tried it. > line 80: cf = 1-(xmin/z)**a > This is again heavy on Python -- you might do better with a loop > over z and use pow from math.h. OK, I put cx and cf into loops and switched from ** to pow. > Hope this helps (and let us know if it doesn't). I found a factor of 2-3 improvement. In particular, I left one python float() in, and that made python and cython go ~the same speed. When I changed it to <float>, it dropped the cython time by 2. Now execution times are: n=2e4 python~3x fortran cython~1.5x fortran So this may be as fast as I can get. I'm a little confused about fortran getting slower relative to python as n gets larger, but that is probably some sort of failure on my part. Thanks for the help! Adam _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
