Travis Oliphant wrote: > David Cournapeau wrote: > >> Hi, >> >> While profiling some code, I noticed that sum in numpy is kind of >> slow once you use axis argument: >> >> > Yes, this is expected because when using an access argument, the > following two things can happen > > 1) You may be skipping over large chunks of memory to get to the next > available number and out-of-cache memory access is slow. > > 2) You have to allocate a result array. > > >> import numpy as N >> a = N.random.randn(1e5, 30) >> %timeit N.sum(a) #-> 26.8ms >> %timeit N.sum(a, 1) #-> 65.5ms >> %timeit N.sum(a, 0) #-> 141ms >> >> Now, if I use some tricks, I get: >> >> %timeit N.sum(a) #-> 26.8 ms >> %timeit N.dot(a, N.ones(a.shape[1], a.dtype)) #-> 11.3ms >> %timeit N.dot(N.ones((1, a.shape[0]), a.dtype), a) #-> 15.5ms >> >> I realize that dot uses optimized libraries (atlas in my case) and all, >> but is there any way to improve this situation ? >> >> > Sum does *not* use an optimized library so it is not too surprising that > you can get speed-ups using ATLAS. I understand that there is no optimization going on with sum or multiply. This was just to have a comparison (this kind of things varies *a lot* accross CPU of the same architecture). > It would be nice to do something to > optimize the reduction functions in NumPy, but nobody has come forward > with suggestions yet. > So this is possible to improve things ? I noticed that sum/multiply and co are using reduction functions. Should I follow the same scheme than what I did for clip (following dot related optimization, basically) ?
David _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
