A Friday 22 May 2009 11:42:56 Gregor Thalhammer escrigué: > dmitrey schrieb: > > hi all, > > has anyone already tried to compare using an ordinary numpy ufunc vs > > that one from corepy, first of all I mean the project > > http://socghop.appspot.com/student_project/show/google/gsoc2009/python/t1 > >24024628235 > > > > It would be interesting to know what is speedup for (eg) vec ** 0.5 or > > (if it's possible - it isn't pure ufunc) numpy.dot(Matrix, vec). Or > > any another example. > > I have no experience with the mentioned CorePy, but recently I was > playing around with accelerated ufuncs using Intels Math Kernel Library > (MKL). These improvements are now part of the numexpr package > http://code.google.com/p/numexpr/ > Some remarks on possible speed improvements on recent Intel x86 processors. > 1) basic arithmetic ufuncs (add, sub, mul, ...) in standard numpy are > fast (SSE is used) and speed is limited by memory bandwidth. > 2) the speed of many transcendental functions (exp, sin, cos, pow, ...) > can be improved by _roughly_ a factor of five (single core) by using the > MKL. Most of the improvements stem from using faster algorithms with a > vectorized implementation. Note: the speed improvement depends on a > _lot_ of other circumstances. > 3) Improving performance by using multi cores is much more difficult. > Only for sufficiently large (>1e5) arrays a significant speedup is > possible. Where a speed gain is possible, the MKL uses several cores. > Some experimentation showed that adding a few OpenMP constructs you > could get a similar speedup with numpy. > 4) numpy.dot uses optimized implementations.
Good points Gregor. However, I wouldn't say that improving performance by using multi cores is *that* difficult, but rather that multi cores can only be used efficiently *whenever* the memory bandwith is not a limitation. An example of this is the computation of transcendental functions, where, even using vectorized implementations, the computation speed is still CPU-bounded in many cases. And you have experimented yourself very good speed-ups for these cases with your implementation of numexpr/MKL :) Cheers, -- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion